y = β₀ + β₁x
Where:
β₀ is the intercept.
β₁ is the slope.
y = β₀ + β₁x + β₂x²
Where:
β₂ is the coefficient of the squared term.
The Curve:
The x² term introduces a curve into the relationship.
If β₂ is positive, the curve opens upward (like a U).
If β₂ is negative, the curve opens downward (like an inverted U).
# Descriptive statistics
Cleaned_TMA_Data %>% skim(Population)
| Name | Piped data |
| Number of rows | 9 |
| Number of columns | 76 |
| _______________________ | |
| Column type frequency: | |
| numeric | 1 |
| ________________________ | |
| Group variables | None |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| Population | 0 | 1 | 262939.9 | 81660.38 | 174370 | 177924 | 311206 | 330800 | 351628 | ▇▁▁▂▇ |
Cleaned_TMA_Data %>% skim(IGF)
| Name | Piped data |
| Number of rows | 9 |
| Number of columns | 76 |
| _______________________ | |
| Column type frequency: | |
| numeric | 1 |
| ________________________ | |
| Group variables | None |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| IGF | 0 | 1 | 21963737 | 4371810 | 13748337 | 19752424 | 23144607 | 24450113 | 28142151 | ▂▂▅▇▅ |
# Histograms
ggplot(Cleaned_TMA_Data, aes(x = Population)) +
geom_histogram(bins = 10, fill = "dodgerblue", color = "black") +
labs(title = "Distribution of Population", x = "Population", y = "Frequency") +
scale_x_continuous(labels = comma)
ggplot(Cleaned_TMA_Data, aes(x = IGF)) +
geom_histogram(bins = 10, fill = "dodgerblue", color = "black") +
labs(title = "Distribution of IGF Revenue", x = "IGF Revenue", y = "Frequency") +
scale_x_continuous(labels = comma)
# Growth Rate (Percentage)
Cleaned_TMA_Data <- Cleaned_TMA_Data %>%
mutate(
Population_Growth_Rate = c(NA, diff(Population) / Population[-length(Population)] * 100),
IGF_Growth_Rate = c(NA, diff(IGF) / IGF[-length(IGF)] * 100)
)
# Plot of Trends
ggplot(Cleaned_TMA_Data, aes(x = Year)) +
geom_line(aes(y = Population)) +
geom_point(aes(y = Population), color = "dodgerblue") +
labs(title = "Population Trend", x = "Year", y = "Population") +
scale_y_continuous(labels = comma)
ggplot(Cleaned_TMA_Data, aes(x = Year)) +
geom_line(aes(y = IGF)) +
geom_point(aes(y = IGF), color = "dodgerblue") +
labs(title = "IGF Trend", x = "Year", y = "IGF") +
scale_y_continuous(labels = comma)
ggplot(Cleaned_TMA_Data, aes(x = Year)) +
geom_line(aes(y = Population, color = "Population")) +
geom_point(aes(y = Population, color = "Population")) +
geom_line(aes(y = IGF, color = "IGF")) +
geom_point(aes(y = IGF, color = "IGF")) +
labs(title = "Population vs. IGF Revenue", x = "Year", y = "Amount/Population", color = "Type") +
scale_y_continuous(labels = comma)
# Growth rate plots
ggplot(Cleaned_TMA_Data, aes(x = Year)) +
geom_line(aes(y = Population_Growth_Rate, color = "Population Growth")) +
geom_point(aes(y = Population_Growth_Rate, color = "Population Growth")) +
geom_line(aes(y = IGF_Growth_Rate, color = "IGF Growth")) +
geom_point(aes(y = IGF_Growth_Rate, color = "IGF Growth")) +
labs(title = "Population Growth vs. IGF Growth", x = "Year", y = "Growth Rate (%)", color = "Type") +
scale_y_continuous(labels = percent_format(scale = 1)) +
geom_hline(yintercept = 0, linetype = "dashed", color = "red") # Add horizontal line at zero
The histograms show an uneven distribution of population and IGF revenue. The population had the highest around 450,000. The trends plots show clear that the trend of IGF Revenue ( which experienced significant changes) is not directly linked to the trend of Population( which has a stable rise).
mod1 <- lm(IGF ~ Population, data = Cleaned_TMA_Data)
summary(mod1)
##
## Call:
## lm(formula = IGF ~ Population, data = Cleaned_TMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7570834 -1776081 145785 2087103 7221596
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 25475147.92 5368698.90 4.745 0.0021 **
## Population -13.35 19.60 -0.682 0.5175
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4526000 on 7 degrees of freedom
## Multiple R-squared: 0.06222, Adjusted R-squared: -0.07175
## F-statistic: 0.4645 on 1 and 7 DF, p-value: 0.5175
Cleaned_TMA_Data %>%
ggplot(aes(x = Population, y = IGF)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(x = "Population", y = "IGF Revenue (Ghana Cedis)", title = "Linear Relationship between Population and IGF Revenue") +
scale_y_continuous(labels = scales::comma)
# The Quadratic Term
Cleaned_TMA_Data$Population_Squared <- Cleaned_TMA_Data$Population^2
# Quadratic Regression
mod_quad <- lm(IGF ~ Population + Population_Squared, data = Cleaned_TMA_Data)
summary(mod_quad)
##
## Call:
## lm(formula = IGF ~ Population + Population_Squared, data = Cleaned_TMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4731775 -3941607 302186 2508018 6084357
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 82622023.3281143 54289197.1474813 1.522 0.179
## Population -503.5459820 463.8418536 -1.086 0.319
## Population_Squared 0.0009558 0.0009036 1.058 0.331
##
## Residual standard error: 4488000 on 6 degrees of freedom
## Multiple R-squared: 0.2096, Adjusted R-squared: -0.05386
## F-statistic: 0.7956 on 2 and 6 DF, p-value: 0.4938
ggplot(Cleaned_TMA_Data, aes(x = Population, y = IGF)) +
geom_point() +
geom_smooth(method = "lm", formula = y ~ x + I(x^2), se = TRUE) + # Use formula for quadratic
labs(x = "Population", y = "IGF Revenue (Ghana Cedis)", title = "Quadratic Relationship between Population and IGF Revenue") +
scale_y_continuous(labels = comma)
Linear Regression:
Coefficients:
Intercept: 25475147.92
Population: -13.35 . For each unit increase in population, IGF is predicted to decrease by approximately -13.35 Ghana Cedis but the linear regression results is non-significant .
P-values: Intercept: 0.0021 (significant)
Population: 0.5175 (insignificant)
R-squared: Multiple R-squared: 0.0622
Adjusted R-squared:-0.07175
Interpretation:
The linear model shows a very weak and statistically insignificant
relationship between population and IGF revenue.
Population explains as high as 6.22% of the variance in IGF.
Quadratic Regression:
Coefficients: Intercept: 82622023.3281143
Population: -503.5459820
Population_Squared: 0.0009558
P-values: All coefficients are statistically insignificant (p > 0.01). But the overall model is also statistically insignificant ( p-value = 0.4938).
R-squared: Multiple R-squared: 0.2096
Adjusted R-squared: -0.05386
Interpretation:
The quadratic model shows a statistically insignificant relationship between population and IGF revenue. The insignificant quadratic terms confirm that though the relationship is not linear a quadratic relationship is hard to capture their relationship.
The R-squared of 0.2096 indicates that the quadratic model explains 20.96% of the variance in IGF, a little improvement of the linear model but since it is non-significant.
# Transformed Model
lm(Ln_IGF ~ Ln_Pop, data = Cleaned_TMA_Data) %>% summary()
##
## Call:
## lm(formula = Ln_IGF ~ Ln_Pop, data = Cleaned_TMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.40856 -0.06193 0.00984 0.12375 0.32499
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 19.2207 2.9458 6.525 0.000326 ***
## Ln_Pop -0.1878 0.2369 -0.793 0.453806
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2208 on 7 degrees of freedom
## Multiple R-squared: 0.08243, Adjusted R-squared: -0.04865
## F-statistic: 0.6289 on 1 and 7 DF, p-value: 0.4538
# Scatter Plots (Transformed Data)
ggplot(Cleaned_TMA_Data, aes(x = Ln_Pop, y = Ln_IGF)) +
geom_point() +
geom_smooth(method = "lm") +
labs(title = "Log(Population) vs. Log(IGF Revenue)", x = "Log(Population)", y = "Log(IGF Revenue)")
#GAM
gam(IGF ~ s(Population, k = 9) + Ln_Tt_Revenue + CollRate_Fees, data = Cleaned_TMA_Data) %>% summary()
##
## Family: gaussian
## Link function: identity
##
## Formula:
## IGF ~ s(Population, k = 9) + Ln_Tt_Revenue + CollRate_Fees
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -294996765 71285289 -4.138 0.00901 **
## Ln_Tt_Revenue 18303341 3941990 4.643 0.00562 **
## CollRate_Fees -12855 31056 -0.414 0.69609
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(Population) 1 1 2.884 0.15
##
## R-sq.(adj) = 0.911 Deviance explained = 94.5%
## GCV = 3.0482e+12 Scale est. = 1.6935e+12 n = 9
After the log transformation the result is still insignificant ( p-value: 0.4538 and Multiple R-squared: 0.08243)
Even with the flexible Generalized Additive Model (GAM) the smooth term for population was not statistically significant (p = 0.15.
# Scatter Plot
ggplot(Cleaned_TMA_Data, aes(x = Population, y = IGF)) +
geom_point() +
labs(title = "Population vs. IGF Revenue", x = "Population", y = "IGF Revenue")
# Residual
ggplot(data = data.frame(residuals = residuals(mod1), fitted = fitted(mod1)), aes(x = fitted, y = residuals)) +
geom_point() + # Added geom_point()
geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
labs(title = "Residuals vs. Fitted (Linear) ", x = "Fitted Values", y = "Residuals")
ggplot(data = data.frame(residuals = residuals(mod1)), aes(x = residuals)) +
geom_histogram(bins = 10, fill = "skyblue", color = "black") +
labs(title = "Histogram of Residuals(Linear)", x = "Residuals")
ggplot(data = data.frame(residuals = residuals(mod1)), aes(sample = residuals)) +
geom_point(stat = "qq") +
stat_qq_line() +
labs(title = "Q-Q Plot of Residuals")
shapiro.test(resid(mod1))
##
## Shapiro-Wilk normality test
##
## data: resid(mod1)
## W = 0.98987, p-value = 0.9959
# Residuals vs. Fitted Values
ggplot(data = data.frame(residuals = residuals(mod_quad), fitted = fitted(mod_quad)),
aes(x = fitted, y = residuals)) +
geom_point() +
geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
labs(title = "Residuals vs. Fitted (Quadratic Model)", x = "Fitted Values", y = "Residuals")
# Histogram of Residuals
ggplot(data = data.frame(residuals = residuals(mod_quad)), aes(x = residuals)) +
geom_histogram(bins = 10, fill = "skyblue", color = "black") +
labs(title = "Histogram of Residuals (Quadratic Model)", x = "Residuals")
shapiro.test(resid(mod_quad))
##
## Shapiro-Wilk normality test
##
## data: resid(mod_quad)
## W = 0.93329, p-value = 0.5132
# Q-Q Plot of Residuals
ggplot(data = data.frame(residuals = residuals(mod_quad)), aes(sample = residuals)) +
geom_point(stat = "qq") +
stat_qq_line() +
labs(title = "Q-Q Plot of Residuals (Quadratic Model)")
# Durbin-Watson Test (Autocorrelation)
dwtest(mod1)
##
## Durbin-Watson test
##
## data: mod1
## DW = 1.3918, p-value = 0.08374
## alternative hypothesis: true autocorrelation is greater than 0
dwtest(mod_quad)
##
## Durbin-Watson test
##
## data: mod_quad
## DW = 1.8379, p-value = 0.1227
## alternative hypothesis: true autocorrelation is greater than 0
# Breusch-Pagan Test (Homoscedasticity)
bptest(mod1)
##
## studentized Breusch-Pagan test
##
## data: mod1
## BP = 1.271, df = 1, p-value = 0.2596
bptest(mod_quad)
##
## studentized Breusch-Pagan test
##
## data: mod_quad
## BP = 2.3299, df = 2, p-value = 0.3119
# Variance Inflation Factor (VIF) - Multicollinearity
bptest(mod1)
##
## studentized Breusch-Pagan test
##
## data: mod1
## BP = 1.271, df = 1, p-value = 0.2596
vif(mod_quad)
## Population Population_Squared
## 569.833 569.833
For the linear model all the assumptions are met but for the quadratic model Multicollinearity is present and a trend or can be found in the residuals this means the quadratic model violated 2 assumptions.
Therefore from the analysis so far we found that there is statistically insignificant negative and non-linear relationship between population and IGF revenue, though the linear model did not violate any assumption. The relationship is can not be captured by the linear and other transformations even the GAM. The scatter plots indicated two clusters in the population and IGF.
Cleaned_TMA_Data %>% skim(Population)
| Name | Piped data |
| Number of rows | 9 |
| Number of columns | 79 |
| _______________________ | |
| Column type frequency: | |
| numeric | 1 |
| ________________________ | |
| Group variables | None |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| Population | 0 | 1 | 262939.9 | 81660.38 | 174370 | 177924 | 311206 | 330800 | 351628 | ▇▁▁▂▇ |
Cleaned_TMA_Data %>% skim(DACF)
| Name | Piped data |
| Number of rows | 9 |
| Number of columns | 79 |
| _______________________ | |
| Column type frequency: | |
| numeric | 1 |
| ________________________ | |
| Group variables | None |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| DACF | 0 | 1 | 2030075 | 470625.9 | 1326091 | 1761596 | 2037955 | 2254656 | 2863934 | ▂▇▅▅▂ |
# Histograms
ggplot(Cleaned_TMA_Data, aes(x = Population)) +
geom_histogram(bins = 10, fill = "dodgerblue", color = "black") +
labs(title = "Distribution of Population", x = "Population")
ggplot(Cleaned_TMA_Data, aes(x = DACF)) +
geom_histogram(bins = 10, fill = "dodgerblue", color = "black") +
labs(title = "Distribution of DACF Revenue", x = "DACF Revenue")
#Growth Rates and Per Capita Values
Cleaned_TMA_Data <- Cleaned_TMA_Data %>%
mutate(
Population_Growth_Rate = c(NA, diff(Population) / Population[-length(Population)] * 100),
DACF_Growth_Rate = c(NA, diff(DACF) / DACF[-length(DACF)] * 100)
)
# Plotting Trends
ggplot(Cleaned_TMA_Data, aes(x = Year)) +
geom_line(aes(y = Population)) +
geom_point(aes(y = Population), color = "dodgerblue") +
labs(title = "Population Trend", x = "Year", y = "Population") +
scale_y_continuous(labels = comma)
ggplot(Cleaned_TMA_Data, aes(x = Year)) +
geom_line(aes(y = DACF)) +
geom_point(aes(y = DACF), color = "dodgerblue") +
labs(title = "DACF Trend", x = "Year", y = "IGF") +
scale_y_continuous(labels = comma)
ggplot(Cleaned_TMA_Data, aes(x = Year)) +
geom_line(aes(y = Population, color = "Population")) +
geom_point(aes(y = Population, color = "Population")) +
geom_line(aes(y = DACF, color = "DACF")) +
geom_point(aes(y = DACF, color = "DACF")) +
labs(title = "Population vs. DACF Revenue", x = "Year", y = "Amount/Population", color = "Type") +
scale_y_continuous(labels = scales::comma)
# Plotting Growth Rates
ggplot(Cleaned_TMA_Data, aes(x = Year)) +
geom_line(aes(y = Population_Growth_Rate, color = "Population Growth")) +
geom_point(aes(y = Population_Growth_Rate, color = "Population Growth")) +
geom_line(aes(y = DACF_Growth_Rate, color = "DACF Growth")) +
geom_point(aes(y = DACF_Growth_Rate, color = "DACF Growth")) +
labs(title = "Population Growth vs. DACF Growth", x = "Year", y = "Growth Rate (%)", color = "Type")+
geom_hline(yintercept = 0, linetype = "dashed", color = "red")
The histograms show an uneven distribution of population and DACF revenue. The trends plots show clear that the trend of DACF Revenue ( which experienced significant changes) moves in the opposite direction of the Population( which had a stable rise) are not directly linked.
mod2 <- lm(DACF ~ Population, data = Cleaned_TMA_Data)
summary(mod2)
##
## Call:
## lm(formula = DACF ~ Population, data = Cleaned_TMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -610654 -419869 139098 336566 664509
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2538511.876 562212.239 4.515 0.00275 **
## Population -1.934 2.052 -0.942 0.37740
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 474000 on 7 degrees of freedom
## Multiple R-squared: 0.1126, Adjusted R-squared: -0.0142
## F-statistic: 0.888 on 1 and 7 DF, p-value: 0.3774
Cleaned_TMA_Data %>%
ggplot(aes(x = Population, y = DACF)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) + # Added confidence intervals
labs(x = "Population", y = "DACF Revenue (Ghana Cedis)", title = "Linear Relationship between Population and DACF Revenue") +
scale_y_continuous(labels = scales::comma)
lm(DACF ~ Population + Population_Squared, data = Cleaned_TMA_Data) %>% summary()
##
## Call:
## lm(formula = DACF ~ Population + Population_Squared, data = Cleaned_TMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -540806 -444574 128239 375051 645015
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4467282.59590058 6141838.00509749 0.727 0.494
## Population -18.47817181 52.47529295 -0.352 0.737
## Population_Squared 0.00003226 0.00010222 0.316 0.763
##
## Residual standard error: 507700 on 6 degrees of freedom
## Multiple R-squared: 0.1271, Adjusted R-squared: -0.1639
## F-statistic: 0.4367 on 2 and 6 DF, p-value: 0.6652
There is a statistically insignificant negative relationship between population and DACF revenue performance patterns. As population increases, DACF tends to decrease. Population explains only 11.26% of the variance in DACF. The quadratuc model too is not significant.
#Scatter Plot
ggplot(Cleaned_TMA_Data, aes(x = Population, y = DACF)) +
geom_point() +
labs(title = "Population vs. DACF Revenue",
x = "Population", y = "DACF Revenue")
# Residual
ggplot(data = data.frame(residuals = residuals(mod2),
fitted = fitted(mod2)),
aes(x = fitted, y = residuals)) +
geom_point() +
geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
labs(title = "Residuals vs. Fitted",
x = "Fitted Values", y = "Residuals")
ggplot(data = data.frame(residuals = residuals(mod2)),
aes(x = residuals)) +
geom_histogram(bins = 10, fill = "skyblue", color = "black") +
labs(title = "Histogram of Residuals", x = "Residuals")
ggplot(data = data.frame(residuals = residuals(mod2)),
aes(sample = residuals)) +
stat_qq() +
stat_qq_line() +
labs(title = "Q-Q Plot of Residuals ")
shapiro.test(resid(mod2))
##
## Shapiro-Wilk normality test
##
## data: resid(mod2)
## W = 0.93748, p-value = 0.5556
# Autocorrelation
dwtest(mod2)
##
## Durbin-Watson test
##
## data: mod2
## DW = 2.8719, p-value = 0.8767
## alternative hypothesis: true autocorrelation is greater than 0
# Homoscedasticity (Constant Variance of Residuals)
bptest(mod2)
##
## studentized Breusch-Pagan test
##
## data: mod2
## BP = 2.9822, df = 1, p-value = 0.08419
# Multicollinearity
#simple linear regression with one predictor(population), multicollinearity is not an issue.
# Multivariate Normality
#It is a simple linear regression with one predictor(population), multicollinearity therefore this is not an issue.
The scatter plot shows two the presence of two clusters and non-linear relationship. It shows that as population increases DACF revenue tends to decrease as well. The histogram plot show a potential violation of the normality assumption though the test could not detect it. The Durbin-Watson test revealed no autocorrelation, and the Breusch-Pagan test shows homoscedasticity.
#Transformed Models
log_mod2 <- lm(log(DACF) ~ log(Population), data = Cleaned_TMA_Data)
summary(log_mod2 )
#
# Call:
# lm(formula = log(DACF) ~ log(Population), data = Cleaned_TMA_Data)
#
# Residuals:
# Min 1Q Median 3Q Max
# -0.35558 -0.18516 0.08728 0.18003 0.29081
#
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 17.1780 3.1619 5.433 0.000974 ***
# log(Population) -0.2154 0.2542 -0.847 0.424838
# ---
# Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#
# Residual standard error: 0.237 on 7 degrees of freedom
# Multiple R-squared: 0.09302, Adjusted R-squared: -0.03655
# F-statistic: 0.7179 on 1 and 7 DF, p-value: 0.4248
sqrt_mod2 <- lm( sqrt(DACF)~sqrt(Population), data = Cleaned_TMA_Data )
summary(sqrt_mod2)
#
# Call:
# lm(formula = sqrt(DACF) ~ sqrt(Population), data = Cleaned_TMA_Data)
#
# Residuals:
# Min 1Q Median 3Q Max
# -232.01 -139.20 55.13 123.50 219.16
#
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 1742.8430 369.6621 4.715 0.00217 **
# sqrt(Population) -0.6440 0.7209 -0.893 0.40134
# ---
# Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#
# Residual standard error: 166.5 on 7 degrees of freedom
# Multiple R-squared: 0.1023, Adjusted R-squared: -0.0259
# F-statistic: 0.7981 on 1 and 7 DF, p-value: 0.4013
# Scatter Plots (Transformed Data)
ggplot(Cleaned_TMA_Data, aes(x = log(Population), y = log(DACF))) +
geom_point() +
geom_smooth(method = "lm")+
labs(title = "Log(Population) vs. Log(DACF Revenue)",
x = "Log(Population)", y = "Log(DACF Revenue)")
ggplot(Cleaned_TMA_Data, aes(x = log(Population), y = log(DACF))) +
geom_point() +
geom_smooth(method = "lm")+
labs(title = "Sqrt(Population) vs. Sqrt(DACF Revenue)",
x = "Sqrt(Population)", y = "Sqrt(DACF Revenue)")
Both the log-log and square root transformations are statistically not significant and neither improved the model fit compared to the linear model.
# Function to perform diagnostic tests and plots
perform_diagnostics <- function(model, model_name) {
# Residuals vs. Fitted
plot1 <- ggplot(data = data.frame(residuals = residuals(model), fitted = fitted(model)),
aes(x = fitted, y = residuals)) +
geom_point() +
geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
labs(title = paste("Residuals vs. Fitted (", model_name, ")"), x = "Fitted Values", y = "Residuals")
# Histogram of Residuals
plot2 <- ggplot(data = data.frame(residuals = residuals(model)), aes(x = residuals)) +
geom_histogram(bins = 10, fill = "skyblue", color = "black") +
labs(title = paste("Histogram of Residuals (", model_name, ")"), x = "Residuals")
# Q-Q Plot of Residuals
plot3 <- ggplot(data = data.frame(residuals = residuals(model)), aes(sample = residuals)) +
geom_point(stat = "qq") +
stat_qq_line() +
labs(title = paste("Q-Q Plot of Residuals (", model_name, ")"))
# Durbin-Watson Test
dw_test <- dwtest(model)
print(paste("Durbin-Watson Test (", model_name, "):"))
print(dw_test)
# Breusch-Pagan Test
bp_test <- bptest(model)
print(paste("Breusch-Pagan Test (", model_name, "):"))
print(bp_test)
# Print VIF (if applicable)
if (length(coef(model)) > 2) { # Check for multiple predictors
vif_result <- vif(model)
print(paste("VIF (", model_name, "):"))
print(vif_result)
}
# Arrange plots
grid.arrange(plot1, plot2, plot3, nrow = 1)
}
# Perform diagnostics for each model
perform_diagnostics(mod2, "Linear Model")
## [1] "Durbin-Watson Test ( Linear Model ):"
##
## Durbin-Watson test
##
## data: model
## DW = 2.8719, p-value = 0.8767
## alternative hypothesis: true autocorrelation is greater than 0
##
## [1] "Breusch-Pagan Test ( Linear Model ):"
##
## studentized Breusch-Pagan test
##
## data: model
## BP = 2.9822, df = 1, p-value = 0.08419
perform_diagnostics(log_mod2, "Log-Log Model")
## [1] "Durbin-Watson Test ( Log-Log Model ):"
##
## Durbin-Watson test
##
## data: model
## DW = 2.6567, p-value = 0.7738
## alternative hypothesis: true autocorrelation is greater than 0
##
## [1] "Breusch-Pagan Test ( Log-Log Model ):"
##
## studentized Breusch-Pagan test
##
## data: model
## BP = 0.87598, df = 1, p-value = 0.3493
perform_diagnostics(sqrt_mod2, "Square Root Model")
## [1] "Durbin-Watson Test ( Square Root Model ):"
##
## Durbin-Watson test
##
## data: model
## DW = 2.7683, p-value = 0.831
## alternative hypothesis: true autocorrelation is greater than 0
##
## [1] "Breusch-Pagan Test ( Square Root Model ):"
##
## studentized Breusch-Pagan test
##
## data: model
## BP = 1.8828, df = 1, p-value = 0.17
shapiro.test(resid(mod2))
##
## Shapiro-Wilk normality test
##
## data: resid(mod2)
## W = 0.93748, p-value = 0.5556
shapiro.test(resid(log_mod2))
##
## Shapiro-Wilk normality test
##
## data: resid(log_mod2)
## W = 0.93697, p-value = 0.5504
shapiro.test(resid(sqrt_mod2))
##
## Shapiro-Wilk normality test
##
## data: resid(sqrt_mod2)
## W = 0.9401, p-value = 0.5829
The diagnostic tests indicate that all the three models satisfy the assumptions tests but the plots show slight violation of the normality assumption.
Therefore, from the regression analysis results all three models appear to fail to capture the relationship. The transformations did not solve the change the situation either. The dataset might be a hindrance therefore the relationship remains unclear given this this data.
# Calculate descriptive statistics
desc_stats <- Cleaned_TMA_Data %>%
summarize(
Population_mean = mean(Population),
Population_sd = sd(Population),
Population_min = min(Population),
Population_max = max(Population),
Capital_Expenditure_mean = mean(Capital_Expenditure),
Capital_Expenditure_sd = sd(Capital_Expenditure),
Capital_Expenditure_min = min(Capital_Expenditure),
Capital_Expenditure_max = max(Capital_Expenditure),
Recrrent_Expenditure_mean = mean(Recrrent_Expenditure),
Recrrent_Expenditure_sd = sd(Recrrent_Expenditure),
Recrrent_Expenditure_min = min(Recrrent_Expenditure),
Recrrent_Expenditure_max = max(Recrrent_Expenditure)
)
cat("
## Descriptive Statistics
| Statistic | Population | Capital Expenditure | Recurrent Expenditure |
|------------------------|------------|---------------------|-----------------------|
| Mean |", format(desc_stats$Population_mean, big.mark = ",", digits = 2),
"|", format(desc_stats$Capital_Expenditure_mean, big.mark = ",", digits = 2),
"|", format(desc_stats$Recrrent_Expenditure_mean, big.mark = ",", digits = 2), "|
| Standard Deviation |", format(desc_stats$Population_sd, big.mark = ",", digits = 2),
"|", format(desc_stats$Capital_Expenditure_sd, big.mark = ",", digits = 2),
"|", format(desc_stats$Recrrent_Expenditure_sd, big.mark = ",", digits = 2), "|
| Minimum |", format(desc_stats$Population_min, big.mark = ",", digits = 2),
"|", format(desc_stats$Capital_Expenditure_min, big.mark = ",", digits = 2),
"|", format(desc_stats$Recrrent_Expenditure_min, big.mark = ",", digits = 2), "|
| Maximum |", format(desc_stats$Population_max, big.mark = ",", digits = 2),
"|", format(desc_stats$Capital_Expenditure_max, big.mark = ",", digits = 2),
"|", format(desc_stats$Recrrent_Expenditure_max, big.mark = ",", digits = 2), "|
\n")
##
## ## Descriptive Statistics
##
## | Statistic | Population | Capital Expenditure | Recurrent Expenditure |
## |------------------------|------------|---------------------|-----------------------|
## | Mean | 262,940 | 8,430,470 | 15,377,435 |
## | Standard Deviation | 81,660 | 3,635,231 | 5,408,861 |
## | Minimum | 174,370 | 4,724,892 | 5,140,669 |
## | Maximum | 351,628 | 16,210,705 | 24,388,461 |
# Capital Expenditure Histogram
cap_hist <- ggplot(Cleaned_TMA_Data, aes(x = Capital_Expenditure)) +
geom_histogram(aes(y = ..density..), bins = 10, fill = "skyblue", color = "black") +
geom_density(color = "red") +
labs(title = "Distribution of Capital Expenditure", x = "Capital Expenditure (Ghana Cedis)", y = "Density") +
scale_x_continuous(labels = comma)
# Recurrent Expenditure Histogram
rec_hist <- ggplot(Cleaned_TMA_Data, aes(x = Recrrent_Expenditure)) +
geom_histogram(aes(y = ..density..), bins = 10, fill = "lightgreen", color = "black") +
geom_density(color = "red") +
labs(title = "Distribution of Recurrent Expenditure", x = "Recurrent Expenditure (Ghana Cedis)", y = "Density") +
scale_x_continuous(labels = comma)
# Population Histogram
pop_hist <- ggplot(Cleaned_TMA_Data, aes(x = Population)) +
geom_histogram(aes(y = ..density..), bins = 10, fill = "dodgerblue", color = "black") +
geom_density(color = "red") +
labs(title = "Distribution of Population", x = "Population", y = "Density") +
scale_x_continuous(labels = comma)
cap_hist
rec_hist
pop_hist
ggplot(Cleaned_TMA_Data, aes(x = Year)) +
geom_line(aes(y = Population)) +
geom_point(aes(y = Population), color = "dodgerblue") +
labs(title = "Population Trend", x = "Year", y = "Population") +
scale_y_continuous(labels = comma)
ggplot(Cleaned_TMA_Data, aes(x = Year)) +
geom_line(aes(y = Capital_Expenditure, color = "Capital Expenditure")) +
geom_point(aes(y = Capital_Expenditure, color = "Capital Expenditure")) +
geom_line(aes(y = Recrrent_Expenditure, color = "Recurrent Expenditure")) +
geom_point(aes(y = Recrrent_Expenditure, color = "Recurrent Expenditure")) +
labs(title = " Expenditure Trends", x = "Year", y = "Amount", color = "Type") +
theme(axis.title.y.right = element_text(vjust=2))
ggplot(Cleaned_TMA_Data, aes(x = Year)) +
geom_line(aes(y = Capital_Exp_Per_Capita, color = "Capital Exp. Per Capita")) +
geom_point(aes(y = Capital_Exp_Per_Capita, color = "Capital Exp. Per Capita")) +
geom_line(aes(y = Rec_Exp_Per_Capita, color = "Recurrent Exp. Per Capita")) +
geom_point(aes(y = Rec_Exp_Per_Capita, color = "Recurrent Exp. Per Capita")) +
labs(title = "Expenditure Per Capita Over Time", x = "Year", y = "Ghana Cedis Per Capita", color = "Type") +
scale_y_continuous(labels = comma)
# Calculate Per Capita Values
Cleaned_TMA_Data$Capital_Exp_Per_Capita <- Cleaned_TMA_Data$Capital_Expenditure / Cleaned_TMA_Data$Population
# Plotting Trends
ggplot(Cleaned_TMA_Data, aes(x = Year)) +
geom_line(aes(y = Population, color = "Population")) +
geom_point(aes(y = Population, color = "Population")) +
geom_line(aes(y = Capital_Expenditure, color = "Capital Expenditure")) +
geom_point(aes(y = Capital_Expenditure, color = "Capital Expenditure")) +
labs(title = "Population and Capital Expenditure Trends", x = "Year", y = "Amount", color = "Type") +
scale_y_continuous(labels = comma, sec.axis = sec_axis(~., name = "Population")) +
theme(axis.title.y.right = element_text(vjust=2))
# Per Capita Analysis
average_capita <- mean(Cleaned_TMA_Data$Capital_Exp_Per_Capita)
ggplot(Cleaned_TMA_Data, aes(x = Year)) +
geom_line(aes(y = Capital_Exp_Per_Capita, color = "Capital Exp. Per Capita")) +
geom_point(aes(y = Capital_Exp_Per_Capita, color = "Capital Exp. Per Capita")) +
geom_hline(yintercept = average_capita, linetype = "dashed", color = "red")+
labs(title = "Capital Expenditure Per Capita Over Time", x = "Year", y = "Ghana Cedis Per Capita", color = "Type") +
scale_y_continuous(labels = comma)
Cleaned_TMA_Data$Recrrent_Exp_Per_Capita <- Cleaned_TMA_Data$Recrrent_Expenditure / Cleaned_TMA_Data$Population
average_rec_capita <- mean(Cleaned_TMA_Data$Recrrent_Exp_Per_Capita)
ggplot(Cleaned_TMA_Data, aes(x = Year)) +
geom_line(aes(y = Recrrent_Exp_Per_Capita, color = "Recurrent Exp. Per Capita")) +
geom_point(aes(y = Recrrent_Exp_Per_Capita, color = "Recrrent Exp. Per Capita")) +
geom_hline(yintercept = average_rec_capita, linetype = "dashed", color = "red") +
labs(title = "Recurrent Expenditure Per Capita Over Time", x = "Year", y = "Ghana Cedis Per Capita", color = "Type") +
scale_y_continuous(labels = comma)
ggplot(Cleaned_TMA_Data, aes(x = Year)) +
geom_line(aes(y = Population, color = "Population")) +
geom_point(aes(y = Population, color = "Population")) +
geom_line(aes(y = Capital_Expenditure, color = "Capital Expenditure")) +
geom_point(aes(y = Capital_Expenditure, color = "Capital Expenditure")) +
geom_line(aes(y = Recrrent_Expenditure, color = "Recurrent Expenditure")) +
geom_point(aes(y = Recrrent_Expenditure, color = "Recurrent Expenditure")) +
labs(title = "Population and Expenditure Trends", x = "Year", y = "Amount", color = "Type") +
scale_y_continuous(labels = comma, sec.axis = sec_axis(~., name = "Population")) +
theme(axis.title.y.right = element_text(vjust=2))
ggplot(Cleaned_TMA_Data, aes(x = Year)) +
geom_line(aes(y = Capital_Exp_Per_Capita, color = "Capital Exp. Per Capita")) +
geom_point(aes(y = Capital_Exp_Per_Capita, color = "Capital Exp. Per Capita")) +
geom_line(aes(y = Recrrent_Exp_Per_Capita, color = "Recurrent Exp. Per Capita")) +
geom_point(aes(y = Recrrent_Exp_Per_Capita, color = "Recurrent Exp. Per Capita")) +
labs(title = "Expenditure Per Capita Over Time", x = "Year", y = "Ghana Cedis Per Capita", color = "Type") +
scale_y_continuous(labels = comma)
mod3 <- lm(cbind(Capital_Expenditure, Recrrent_Expenditure) ~ Population, data = Cleaned_TMA_Data)
summary(mod3)
## Response Capital_Expenditure :
##
## Call:
## lm(formula = Capital_Expenditure ~ Population, data = Cleaned_TMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3867539 -2695139 -1036731 2815735 7032439
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5532959.18 4466411.60 1.239 0.255
## Population 11.02 16.30 0.676 0.521
##
## Residual standard error: 3765000 on 7 degrees of freedom
## Multiple R-squared: 0.06128, Adjusted R-squared: -0.07283
## F-statistic: 0.4569 on 1 and 7 DF, p-value: 0.5208
##
##
## Response Recrrent_Expenditure :
##
## Call:
## lm(formula = Recrrent_Expenditure ~ Population, data = Cleaned_TMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10603509 -1652861 -983931 2407651 8417481
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13379529.869 6813765.355 1.964 0.0903 .
## Population 7.598 24.870 0.306 0.7689
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5744000 on 7 degrees of freedom
## Multiple R-squared: 0.01316, Adjusted R-squared: -0.1278
## F-statistic: 0.09335 on 1 and 7 DF, p-value: 0.7689
mod_cap <- lm(Capital_Expenditure ~ Population, data = Cleaned_TMA_Data)
summary(mod_cap)
##
## Call:
## lm(formula = Capital_Expenditure ~ Population, data = Cleaned_TMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3867539 -2695139 -1036731 2815735 7032439
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5532959.18 4466411.60 1.239 0.255
## Population 11.02 16.30 0.676 0.521
##
## Residual standard error: 3765000 on 7 degrees of freedom
## Multiple R-squared: 0.06128, Adjusted R-squared: -0.07283
## F-statistic: 0.4569 on 1 and 7 DF, p-value: 0.5208
mod_rec <- lm(Recrrent_Expenditure ~ Population, data = Cleaned_TMA_Data)
summary(mod_rec)
##
## Call:
## lm(formula = Recrrent_Expenditure ~ Population, data = Cleaned_TMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10603509 -1652861 -983931 2407651 8417481
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13379529.869 6813765.355 1.964 0.0903 .
## Population 7.598 24.870 0.306 0.7689
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5744000 on 7 degrees of freedom
## Multiple R-squared: 0.01316, Adjusted R-squared: -0.1278
## F-statistic: 0.09335 on 1 and 7 DF, p-value: 0.7689
Cleaned_TMA_Data %>%
ggplot(aes(x = Population, y = Capital_Expenditure)) +
geom_point()+
geom_smooth(method = "lm", se = TRUE) + labs(x = "Population", y = "Capital Expenditure", title = "Linear Relationship Population and Capital Expenditure")+
scale_y_continuous(labels = scales::comma)
Cleaned_TMA_Data %>%
ggplot(aes(x = Population, y = Recrrent_Expenditure)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(x = "Population", y = "Recurrent Expenditure", title = "Linear Relationship Population and Recurrent Expenditure") +
scale_y_continuous(labels = scales::comma)
From the linear regression results there is non significant linear relationship between Population both Expenditures. They both have very small R-squared values and high p-values indicating their poor model fit.
# Diagnostic Function
perform_diagnostics <- function(model, model_name) {
# Residuals vs. Fitted
plot1 <- ggplot(data = data.frame(residuals = residuals(model), fitted = fitted(model)),
aes(x = fitted, y = residuals)) +
geom_point() +
geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
labs(title = paste("Residuals vs. Fitted (", model_name, ")"), x = "Fitted Values", y = "Residuals")
# Histogram of Residuals
plot2 <- ggplot(data = data.frame(residuals = residuals(model)), aes(x = residuals)) +
geom_histogram(bins = 10, fill = "skyblue", color = "black") +
labs(title = paste("Histogram of Residuals (", model_name, ")"), x = "Residuals")
# Q-Q Plot of Residuals
plot3 <- ggplot(data = data.frame(residuals = residuals(model)), aes(sample = residuals)) +
geom_point(stat = "qq") +
stat_qq_line() +
labs(title = paste("Q-Q Plot of Residuals (", model_name, ")"))
# Durbin-Watson Test
dw_test <- dwtest(model)
print(paste("Durbin-Watson Test (", model_name, "):"))
print(dw_test)
# Breusch-Pagan Test
bp_test <- bptest(model)
print(paste("Breusch-Pagan Test (", model_name, "):"))
print(bp_test)
# Print VIF (if applicable)
if (length(coef(model)) > 2) { # Check for multiple predictors
vif_result <- vif(model)
print(paste("VIF (", model_name, "):"))
print(vif_result)
}
# Arrange plots
grid.arrange(plot1, plot2, plot3, nrow = 1)
}
# Perform Diagnostics
# Capital Expenditure
perform_diagnostics(mod_cap, "Capital Expenditure Model")
## [1] "Durbin-Watson Test ( Capital Expenditure Model ):"
##
## Durbin-Watson test
##
## data: model
## DW = 1.8832, p-value = 0.2945
## alternative hypothesis: true autocorrelation is greater than 0
##
## [1] "Breusch-Pagan Test ( Capital Expenditure Model ):"
##
## studentized Breusch-Pagan test
##
## data: model
## BP = 0.26711, df = 1, p-value = 0.6053
# Recurrent Expenditure
perform_diagnostics(mod_rec, "Recurrent Expenditure Model")
## [1] "Durbin-Watson Test ( Recurrent Expenditure Model ):"
##
## Durbin-Watson test
##
## data: model
## DW = 1.1903, p-value = 0.03879
## alternative hypothesis: true autocorrelation is greater than 0
##
## [1] "Breusch-Pagan Test ( Recurrent Expenditure Model ):"
##
## studentized Breusch-Pagan test
##
## data: model
## BP = 0.92089, df = 1, p-value = 0.3372
From the above tests the Recurrent Expenditure Model violates the autocorrelation assumption and the capital expenditure violates the normality regression assumption.
# Log Transformation for Recurrent Expenditure
log_rec_mod <- lm(log(Recrrent_Expenditure) ~ Population, data = Cleaned_TMA_Data)
summary(log_rec_mod)
##
## Call:
## lm(formula = log(Recrrent_Expenditure) ~ Population, data = Cleaned_TMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.02458 -0.09075 0.02795 0.24488 0.53189
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 16.47250443341 0.55699486589 29.574 0.000000013 ***
## Population 0.00000001531 0.00000203298 0.008 0.994
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4696 on 7 degrees of freedom
## Multiple R-squared: 8.104e-06, Adjusted R-squared: -0.1428
## F-statistic: 5.673e-05 on 1 and 7 DF, p-value: 0.9942
perform_diagnostics(log_rec_mod, "Log Recurrent Expenditure Model")
## [1] "Durbin-Watson Test ( Log Recurrent Expenditure Model ):"
##
## Durbin-Watson test
##
## data: model
## DW = 1.0731, p-value = 0.02204
## alternative hypothesis: true autocorrelation is greater than 0
##
## [1] "Breusch-Pagan Test ( Log Recurrent Expenditure Model ):"
##
## studentized Breusch-Pagan test
##
## data: model
## BP = 0.7591, df = 1, p-value = 0.3836
log_cap_mod <- lm(log(Capital_Expenditure) ~ Population, data = Cleaned_TMA_Data)
summary(log_cap_mod)
##
## Call:
## lm(formula = log(Capital_Expenditure) ~ Population, data = Cleaned_TMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.49193 -0.37141 -0.05009 0.39407 0.63900
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 15.514347363 0.504419752 30.757 0.00000000991 ***
## Population 0.000001354 0.000001841 0.735 0.486
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4252 on 7 degrees of freedom
## Multiple R-squared: 0.07171, Adjusted R-squared: -0.06091
## F-statistic: 0.5407 on 1 and 7 DF, p-value: 0.486
perform_diagnostics(log_cap_mod, "Log capital Expenditure Model")
## [1] "Durbin-Watson Test ( Log capital Expenditure Model ):"
##
## Durbin-Watson test
##
## data: model
## DW = 1.5551, p-value = 0.1373
## alternative hypothesis: true autocorrelation is greater than 0
##
## [1] "Breusch-Pagan Test ( Log capital Expenditure Model ):"
##
## studentized Breusch-Pagan test
##
## data: model
## BP = 0.15613, df = 1, p-value = 0.6927
Cleaned_TMA_Data$Ln_Population <- log(Cleaned_TMA_Data$Population)
Cleaned_TMA_Data$Ln_Capital_Expenditure <- log(Cleaned_TMA_Data$Capital_Expenditure)
ggplot(Cleaned_TMA_Data, aes(x = log(Population), y = log(Capital_Expenditure))) +
geom_point() +
geom_smooth(method = "lm", se = TRUE)+
labs(title = "Log(Population) vs. Log(Capital Expenditure)",
x = "Log(Population)", y = "Log(Capital Expenditure)")
ggplot(Cleaned_TMA_Data, aes(x = log(Population), y = log(Recrrent_Expenditure))) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Log(Population) vs. Log(Recurrent Expenditure)",
x = "Log(Population)", y = "Log(Recurrent Expenditure)")
# Square root transformation for Capital Expenditure
sqrt_cap_mod <- lm(sqrt(Capital_Expenditure) ~ Population, data = Cleaned_TMA_Data)
summary(sqrt_cap_mod)
##
## Call:
## lm(formula = sqrt(Capital_Expenditure) ~ Population, data = Cleaned_TMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -681.5 -497.2 -121.5 523.5 1050.7
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2352.675117 735.947371 3.197 0.0151 *
## Population 0.001883 0.002686 0.701 0.5059
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 620.4 on 7 degrees of freedom
## Multiple R-squared: 0.0656, Adjusted R-squared: -0.06788
## F-statistic: 0.4915 on 1 and 7 DF, p-value: 0.5059
perform_diagnostics(sqrt_cap_mod, "Square root Capital Expenditure Model")
## [1] "Durbin-Watson Test ( Square root Capital Expenditure Model ):"
##
## Durbin-Watson test
##
## data: model
## DW = 1.7126, p-value = 0.205
## alternative hypothesis: true autocorrelation is greater than 0
##
## [1] "Breusch-Pagan Test ( Square root Capital Expenditure Model ):"
##
## studentized Breusch-Pagan test
##
## data: model
## BP = 0.018825, df = 1, p-value = 0.8909
From the transformations both expenditures model are still non-significant and violate some of the assumptions.
Cleaned_TMA_Data$Recrrent_Expenditure_squared <- Cleaned_TMA_Data$Recrrent_Expenditure^2
Cleaned_TMA_Data$Capital_Expenditure_squared <- Cleaned_TMA_Data$Capital_Expenditure^2
mod_quad <- lm(cbind(Capital_Expenditure, Recrrent_Expenditure) ~ Population + Population_Squared, data = Cleaned_TMA_Data)
# View the summary
summary(mod_quad)
## Response Capital_Expenditure :
##
## Call:
## lm(formula = Capital_Expenditure ~ Population + Population_Squared,
## data = Cleaned_TMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2778188 -2565965 -1787984 2500686 7219878
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 30981222.2575110 48075740.0244342 0.644 0.543
## Population -207.2691431 410.7546535 -0.505 0.632
## Population_Squared 0.0004256 0.0008002 0.532 0.614
##
## Residual standard error: 3974000 on 6 degrees of freedom
## Multiple R-squared: 0.1035, Adjusted R-squared: -0.1953
## F-statistic: 0.3465 on 2 and 6 DF, p-value: 0.7204
##
##
## Response Recrrent_Expenditure :
##
## Call:
## lm(formula = Recrrent_Expenditure ~ Population + Population_Squared,
## data = Cleaned_TMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6717413 -2654785 -655915 2983799 6860831
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 91602005.532075 67845678.733128 1.350 0.226
## Population -663.374432 579.667172 -1.144 0.296
## Population_Squared 0.001308 0.001129 1.159 0.291
##
## Residual standard error: 5609000 on 6 degrees of freedom
## Multiple R-squared: 0.1936, Adjusted R-squared: -0.07525
## F-statistic: 0.7201 on 2 and 6 DF, p-value: 0.5245
# Scatter Plots (Transformed Data)
ggplot(Cleaned_TMA_Data, aes(x = Population, y = Capital_Expenditure)) +
geom_point() +
geom_smooth(method = "lm", formula = y ~ x + I(x^2), se = TRUE) +
labs(x = "Population", y = "Capital Expenditure (Ghana Cedis)", title = "Quadratic Relationship between Population and Capital Expenditure") +
scale_y_continuous(labels = comma)
ggplot(Cleaned_TMA_Data, aes(x = Population, y = Recrrent_Expenditure)) +
geom_point() +
geom_smooth(method = "lm", formula = y ~ x + I(x^2), se = TRUE) +
labs(x = "Population", y = "Recurrent Expenditure (Ghana Cedis)", title = "Quadratic Relationship between Population and Recurrent Expenditure") +
scale_y_continuous(labels = comma)
Quadratic models results still show non-significant.
Therefore from the regression analysis above the relationship between population and both expenditures is not linear. Other models could not still capture their relationship with statistically significant results.
Using total revenue growth rate and infrastructure delivery (capital expenditure per capita).
# Descriptive statistics
Cleaned_TMA_Data %>% skim(Capital_Exp_Per_Capita)
| Name | Piped data |
| Number of rows | 9 |
| Number of columns | 85 |
| _______________________ | |
| Column type frequency: | |
| numeric | 1 |
| ________________________ | |
| Group variables | None |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| Capital_Exp_Per_Capita | 0 | 1 | 34.25 | 16.48 | 16.37 | 23.81 | 25.78 | 49 | 58.96 | ▇▇▁▂▅ |
Cleaned_TMA_Data %>% skim(TtRev_Growth_Rate)
| Name | Piped data |
| Number of rows | 9 |
| Number of columns | 85 |
| _______________________ | |
| Column type frequency: | |
| numeric | 1 |
| ________________________ | |
| Group variables | None |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| TtRev_Growth_Rate | 0 | 1 | 6.18 | 25.09 | -53.42 | 7.9 | 13.15 | 20.68 | 30.05 | ▂▁▂▆▇ |
# Histograms
ggplot(Cleaned_TMA_Data, aes(x = Capital_Exp_Per_Capita)) +
geom_histogram(bins = 10, fill = "dodgerblue", color = "black") +
labs(title = "Distribution of Capital expenditure per capita", x = "Capital expenditure per capita") +
scale_x_continuous(labels = comma)
ggplot(Cleaned_TMA_Data, aes(x = TtRev_Growth_Rate)) +
geom_histogram(bins = 10, fill = "dodgerblue", color = "black") +
labs(title = "Distribution of Total Revenue Growth Rate", x = "Total revenue growth rate") +
scale_x_continuous(labels = percent)
# Plotting Trends
ggplot(Cleaned_TMA_Data, aes(x = Year)) +
geom_line(aes(y = TtRev_Growth_Rate, color = "Total Revenue Growth Rate")) +
geom_point(aes(y = TtRev_Growth_Rate, color = "Total Revenue Growth Rate")) +
geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
geom_line(aes(y = Capital_Exp_Per_Capita, color = "Capital Expenditure Per Capita")) +
geom_point(aes(y = Capital_Exp_Per_Capita, color = "Capital Expenditure Per Capita")) +
labs(
title = "Total Revenue Growth Rate vs. Capital Expenditure Per Capita",
x = "Year",
y = "Total Revenue Growth Rate (%)"
) +
scale_y_continuous(
labels = percent_format(scale = 1),
sec.axis = sec_axis(~., name = "Capital Expenditure Per Capita")
) +
scale_color_manual(
values = c("Total Revenue Growth Rate" = "lightseagreen", "Capital Expenditure Per Capita" = "indianred"),
name = "Type"
) +
theme(axis.title.y.right = element_text(vjust = 2))
The histograms show an uneven distribution of Capital expenditure per capita.The trends plots show clear that the trend of Total revenue growth rate ( which experienced significant changes) is not directly linked to the trend of Capital expenditure per capita.
mod5 <- lm(Capital_Exp_Per_Capita ~ TtRev_Growth_Rate, data = Cleaned_TMA_Data)
summary(mod5)
##
## Call:
## lm(formula = Capital_Exp_Per_Capita ~ TtRev_Growth_Rate, data = Cleaned_TMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -17.862 -10.520 -8.464 14.772 24.719
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 34.259021 6.069954 5.644 0.000779 ***
## TtRev_Growth_Rate -0.001256 0.248289 -0.005 0.996105
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 17.62 on 7 degrees of freedom
## Multiple R-squared: 3.655e-06, Adjusted R-squared: -0.1429
## F-statistic: 2.558e-05 on 1 and 7 DF, p-value: 0.9961
ggplot(Cleaned_TMA_Data, aes(x = TtRev_Growth_Rate, y = Capital_Exp_Per_Capita)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE)+
labs(title = "Revenue Growth vs. Capital Expenditure (Per Capita)",
x = "Total Revenue Growth Rate (%)",
y = "Capital Expenditure Per Capita")
The regression result show there no statistically significant relationship between total revenue growth rate and infrastructure delivery (capital expenditure per capita) with p-value ( 0.9961) is greater than 0.05 significance level. This means that changes in revenue growth do not significantly predict changes in capital expenditure per capita in this model. The R-squared (3.655e-06) indicates almost 0% of the variation in capital expenditure per capita can be explained by revenue growth (total revenue growth rate)
Cleaned_TMA_Data$Expenditure_Growth <- c(NA, diff(Cleaned_TMA_Data$Total_Expenditure) / Cleaned_TMA_Data$Total_Expenditure[-nrow(Cleaned_TMA_Data)]) * 100
mod6 <- lm(Capital_Exp_Per_Capita ~ Expenditure_Growth, data = Cleaned_TMA_Data)
summary(mod6)
##
## Call:
## lm(formula = Capital_Exp_Per_Capita ~ Expenditure_Growth, data = Cleaned_TMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15.031 -11.888 -7.608 15.093 21.471
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 37.63881 6.61269 5.692 0.00127 **
## Expenditure_Growth -0.06207 0.14345 -0.433 0.68035
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 17.12 on 6 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.03026, Adjusted R-squared: -0.1314
## F-statistic: 0.1872 on 1 and 6 DF, p-value: 0.6803
ggplot(Cleaned_TMA_Data, aes(x = Expenditure_Growth, y = Capital_Exp_Per_Capita)) +
geom_point() + geom_smooth(method = "lm", se = TRUE)+
labs(title = "Expenditure Growth vs. Capital Expenditure (Per Capita)",
x = "Expenditure Growth Rate (%)",
y = "Capital Expenditure Per Capita")
From the linear regression results there is no statistically significant relationship.
# no variables
# Expenditure Composition:
Cleaned_TMA_Data$CapExp_Pct <- (Cleaned_TMA_Data$Capital_Expenditure / Cleaned_TMA_Data$Total_Expenditure)
Cleaned_TMA_Data$CapExp_Rev_Ratio <- (Cleaned_TMA_Data$Capital_Expenditure / Cleaned_TMA_Data$Total_Revenue)
# Expenditure Composition
ggplot(Cleaned_TMA_Data, aes(x = Year, y = CapExp_Pct)) +
geom_bar(stat = "identity", fill = "dodgerblue") +
geom_point()+
labs(title = "Capital Expenditure as Percentage of Total Expenditure",
x = "Year",
y = "Percentage") +
scale_y_continuous(labels = percent_format(accuracy = 1))
# Trends of Revenue and Expenditure over the years.
ggplot(Cleaned_TMA_Data, aes(x = Year)) +
geom_line(aes(y = Total_Revenue, color = "Total Revenue")) +
geom_point(aes(y = Total_Revenue)) + # Added aes(y = Total_Revenue)
geom_line(aes(y = Total_Expenditure, color = "Total Expenditure")) +
geom_point(aes(y = Total_Expenditure)) + # Added aes(y = Total_Expenditure)
labs(title = "Revenue and Expenditure Trends Over Years",
x = "Year",
y = "Amount (Ghana Cedis)", color = "Type") +
scale_color_manual(values = c("Total Revenue" = "blue", "Total Expenditure" = "red")) +
scale_y_continuous(labels = comma)
ggplot(Cleaned_TMA_Data, aes(x = Year)) +
geom_line(aes(y = Total_Revenue, color = "Total Revenue"), size = 1) +
geom_line(aes(y = IGF, color = "IGF"), size = 1) +
geom_line(aes(y = DACF, color = "DACF"), size = 1) +
geom_line(aes(y = Capital_Expenditure, color = "Capital Expenditure"), size = 1) +
geom_line(aes(y = Recrrent_Expenditure , color = "Recurrent Expenditure"), size = 1) +
geom_line(aes(y = Total_Expenditure, color = "Total Expenditure"), size = 1) +
geom_line(aes(y = Others_Sources, color = "Other Sources"), size = 1) +
labs(
title = "Revenue and Expenditure Trends",
x = "Year",
y = "Amount (Ghana Cedis)",
color = "Type"
) +
scale_color_manual(
values = c(
"Total Revenue" = "blue",
"Other Sources" = "skyblue",
"IGF" = "green",
"DACF" = "darkgray",
"Capital Expenditure" = "purple",
"Total Expenditure" = "red",
"Recurrent Expenditure" = "yellow"
)
) +
scale_y_continuous(labels = scales::comma) +
theme(
legend.position = "right",
legend.title = element_text(face = "bold"),
plot.title = element_text(hjust = 0.5, face = "bold")
)
# IGF to Total Expenditure Ratio
ggplot(Cleaned_TMA_Data, aes(x = Year, y = IGF_TE)) +
geom_line(color = "steelblue", size = 1) +
geom_point(size = 2.5) +
labs(
title = "IGF to Total Expenditure Ratio Over Years",
x = "Year",
y = "Ratio (IGF/Total Expenditure)"
) +
scale_y_continuous(labels = percent_format(accuracy = 1))
# CapExp_Rev_Ratio plot.
ggplot(Cleaned_TMA_Data, aes(x = Year, y = CapExp_Rev_Ratio)) +
geom_line(color = "steelblue", size = 1) +
geom_point(size = 2.5) +
labs(
title = "Capital Expenditure to Total Revenue Ratio Over Years",
x = "Year",
y = "Ratio (Capital Expenditure/Total Revenue)"
) +
scale_y_continuous(labels = comma)
cor.test(Cleaned_TMA_Data$Total_Expenditure, Cleaned_TMA_Data$Total_Revenue)
##
## Pearson's product-moment correlation
##
## data: Cleaned_TMA_Data$Total_Expenditure and Cleaned_TMA_Data$Total_Revenue
## t = 4.6749, df = 7, p-value = 0.002274
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.4885260 0.9723908
## sample estimates:
## cor
## 0.8702901
In the above plots, the Capital Expenditure as Percentage of Total Expenditure shows a slightly high capital investment with peak around 2014, followed by a sustained decline after 2016. Also, there is strong correlation between Total Revenue and Total Expenditure, with both peaking around 2016 and fall afterwards.
# Revenue Per Capita
Cleaned_TMA_Data$Total_Revenue_Per_Capita <- Cleaned_TMA_Data$Total_Revenue / Cleaned_TMA_Data$Population
Cleaned_TMA_Data$IGF_Per_Capita <- Cleaned_TMA_Data$IGF / Cleaned_TMA_Data$Population
Cleaned_TMA_Data$DACF_Per_Capita <- Cleaned_TMA_Data$DACF / Cleaned_TMA_Data$Population
# Time Series Plots (Improved)
# Total Revenue and Expenditure Trends
ggplot(Cleaned_TMA_Data, aes(x = Year)) +
geom_line(aes(y = Total_Revenue, color = "Total Revenue"), size = 1) +
geom_point(aes(y = Total_Revenue, color = "Total Revenue")) +
geom_line(aes(y = IGF, color = "IGF"), size = 1) +
geom_point(aes(y = IGF, color = "IGF")) +
geom_line(aes(y = DACF, color = "DACF"), size = 1) +
geom_point(aes(y = DACF, color = "DACF")) +
geom_line(aes(y = Capital_Expenditure, color = "Capital Expenditure"), size = 1) +
geom_line(aes(y = Recrrent_Expenditure , color = "Recurrent Expenditure"), size = 1) +
geom_point(aes(y = Capital_Expenditure, color = "Capital Expenditure")) +
geom_line(aes(y = Total_Expenditure, color = "Total Expenditure"), size = 1) +
geom_point(aes(y = Total_Expenditure, color = "Total Expenditure")) +
geom_line(aes(y = Others_Sources, color = "Other Sources"), size = 1) +
geom_point(aes(y = Others_Sources, color = "Other Sources")) +
labs(
title = "Revenue and Expenditure Trends Over Years",
x = "Year",
y = "Amount (Ghana Cedis)",
color = "Type"
) +
scale_color_manual(
values = c(
"Total Revenue" = "blue",
"Other Sources" = "skyblue",
"IGF" = "green",
"DACF" = "darkgray",
"Capital Expenditure" = "purple",
"Total Expenditure" = "red",
"Recurrent Expenditure" = "yellow"
)
) +
scale_y_continuous(labels = comma) +
theme(
legend.position = "right",
legend.title = element_text(face = "bold"),
plot.title = element_text(hjust = 0.5, face = "bold")
)
# Population Trend
ggplot(Cleaned_TMA_Data, aes(x = Year, y = Population)) +
geom_line(color = "steelblue", size = 1) +
geom_point(size = 2.5) +
labs(
title = "Population Trend Over Years",
x = "Year",
y = "Population"
) +
scale_y_continuous(labels = comma) +
theme(
plot.title = element_text(hjust = 0.5, face = "bold"),
axis.title = element_text(face = "bold")
)
# IGF to Total Expenditure Ratio
ggplot(Cleaned_TMA_Data, aes(x = Year, y = IGF_TE)) +
geom_line(color = "steelblue", size = 1) +
geom_point(size = 2.5) +
labs(
title = "IGF to Total Expenditure Ratio Over Years",
x = "Year",
y = "Ratio (IGF/Total Expenditure)"
) +
scale_y_continuous(labels = percent_format(accuracy = 1)) +
theme(
plot.title = element_text(hjust = 0.5, face = "bold"),
axis.title = element_text(face = "bold")
)
# Per capita plot
ggplot(Cleaned_TMA_Data, aes(x = Year)) +
geom_line(aes(y = Total_Revenue_Per_Capita, color = "Total Revenue Per Capita")) +
geom_point(aes(y = Total_Revenue_Per_Capita, color = "Total Revenue Per Capita")) +
geom_line(aes(y = IGF_Per_Capita, color = "IGF Per Capita")) +
geom_point(aes(y = IGF_Per_Capita, color = "IGF Per Capita")) +
geom_line(aes(y = DACF_Per_Capita, color = "DACF Per Capita")) +
geom_point(aes(y = DACF_Per_Capita, color = "DACF Per Capita")) +
labs(title = "Revenue Per Capita trends", x = "Year", y = "Amount (Ghana Cedis)", color = "Type") +
scale_y_continuous(labels = comma)
cor_matrix <- cor(Cleaned_TMA_Data[, c("Population", "Total_Revenue", "Total_Expenditure", "IGF_TE", "CapExp_Pct", "IGF")], use = "complete.obs")
print(cor_matrix)
## Population Total_Revenue Total_Expenditure IGF_TE
## Population 1.00000000 -0.01654634 -0.03275889 -0.11415765
## Total_Revenue -0.01654634 1.00000000 0.87029014 -0.38486443
## Total_Expenditure -0.03275889 0.87029014 1.00000000 -0.77523076
## IGF_TE -0.11415765 -0.38486443 -0.77523076 1.00000000
## CapExp_Pct 0.33314280 -0.41553743 -0.27870408 -0.06141442
## IGF -0.24944527 0.94340185 0.82900480 -0.31458341
## CapExp_Pct IGF
## Population 0.33314280 -0.2494453
## Total_Revenue -0.41553743 0.9434019
## Total_Expenditure -0.27870408 0.8290048
## IGF_TE -0.06141442 -0.3145834
## CapExp_Pct 1.00000000 -0.5982153
## IGF -0.59821527 1.0000000
corrplot(cor_matrix, main = "Correlation matrix of population and expenditure patterns")
In the above there is a strong positive correlation between total revenue and total expenditure and also between IGF.
# Total Revenue vs Population
model_revenue_pop <- lm(Total_Revenue ~ Population, data = Cleaned_TMA_Data)
summary(model_revenue_pop)
##
## Call:
## lm(formula = Total_Revenue ~ Population, data = Cleaned_TMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12409733 -4647150 -1125595 6134998 10886395
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 36772413.660 9208032.446 3.994 0.00523 **
## Population -1.471 33.608 -0.044 0.96630
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7763000 on 7 degrees of freedom
## Multiple R-squared: 0.0002738, Adjusted R-squared: -0.1425
## F-statistic: 0.001917 on 1 and 7 DF, p-value: 0.9663
# Total Expenditure vs Population
model_expenditure_pop <- lm(Total_Expenditure ~ Population, data = Cleaned_TMA_Data)
summary(model_expenditure_pop)
##
## Call:
## lm(formula = Total_Expenditure ~ Population, data = Cleaned_TMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -19725491 -828997 260211 5016575 8886541
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 36033170.438 10690119.426 3.371 0.0119 *
## Population -3.384 39.018 -0.087 0.9333
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9012000 on 7 degrees of freedom
## Multiple R-squared: 0.001073, Adjusted R-squared: -0.1416
## F-statistic: 0.00752 on 1 and 7 DF, p-value: 0.9333
# Capital Expenditure vs Total Revenue and IGF_TE
model_capital_rev_igf <- lm(Capital_Expenditure ~ Total_Revenue + IGF_TE, data = Cleaned_TMA_Data)
summary(model_capital_rev_igf)
##
## Call:
## lm(formula = Capital_Expenditure ~ Total_Revenue + IGF_TE, data = Cleaned_TMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3537240 -1751092 -268196 1891113 5421506
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 21628659.817864 10620833.425263 2.036 0.0879 .
## Total_Revenue -0.001251 0.166573 -0.008 0.9943
## IGF_TE -20410550.193464 10294834.110031 -1.983 0.0947 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3158000 on 6 degrees of freedom
## Multiple R-squared: 0.434, Adjusted R-squared: 0.2453
## F-statistic: 2.3 on 2 and 6 DF, p-value: 0.1813
# IGF_TE vs Population and Total Revenue
model_igfte_pop_rev <- lm(IGF_TE ~ Population + Total_Revenue, data = Cleaned_TMA_Data)
summary(model_igfte_pop_rev)
##
## Call:
## lm(formula = IGF_TE ~ Population + Total_Revenue, data = Cleaned_TMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.13417 -0.06928 -0.01362 0.08779 0.18710
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.917775277753 0.266663923178 3.442 0.0138 *
## Population -0.000000173477 0.000000537626 -0.323 0.7579
## Total_Revenue -0.000000006259 0.000000006045 -1.035 0.3404
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1242 on 6 degrees of freedom
## Multiple R-squared: 0.1627, Adjusted R-squared: -0.1165
## F-statistic: 0.5827 on 2 and 6 DF, p-value: 0.5871
# Visualizations
# Scatter plot: Total Revenue vs Population
ggplot(Cleaned_TMA_Data, aes(x = Population, y = Total_Revenue)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Total Revenue vs Population", x = "Population", y = "Total Revenue") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
# Scatter plot: Total Expenditure vs Population
ggplot(Cleaned_TMA_Data, aes(x = Population, y = Total_Expenditure)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Total Expenditure vs Population", x = "Population", y = "Total Expenditure") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
# Scatter plot: Capital Expenditure vs Total Revenue
ggplot(Cleaned_TMA_Data, aes(x = Total_Revenue, y = Capital_Expenditure)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Capital Expenditure vs Total Revenue", x = "Total Revenue", y = "Capital Expenditure") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
# Scatter plot: IGF_TE vs Population
ggplot(Cleaned_TMA_Data, aes(x = Population, y = IGF_TE)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "IGF_TE vs Population", x = "Population", y = "IGF_TE") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = percent_format(accuracy = 1))
ggplot(Cleaned_TMA_Data, aes(x = Total_Revenue, y = IGF_TE)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "IGF_TE vs Total Revenue", x = "Total Revenue", y = "IGF_TE") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = percent_format(accuracy = 1))
In the regression results above, we found a insignificant linear relationship between between Total Revenue and Population, Total Expenditure and Population, and Capital Expenditure, Total Revenue, and between IGF_TE vs Population and Total Revenue.
# no variables
The column for permit fees (Act_permit fees) is NA , therefore it can not be included in these analysis.
# IGF Trend
ggplot(Cleaned_TMA_Data, aes(x = Year, y = IGF)) +
geom_line(color = "blue", size = 1) +
geom_point(size = 2.5) +
labs(
title = "IGF Trend Over Years",
x = "Year",
y = "IGF (Ghana Cedis)"
) +
scale_y_continuous(labels = comma) +
theme(
plot.title = element_text(hjust = 0.5, face = "bold"),
axis.title = element_text(face = "bold")
)
# Land-Based Revenue Trends
ggplot(Cleaned_TMA_Data, aes(x = Year)) +
geom_point(aes(y = Act_Permit, color = "Permit Fees")) +
geom_line(aes(y = Act_Property_Rates, color = "Property Rates"), size = 1) +
geom_point(aes(y = Act_Property_Rates, color = "Property Rates")) +
geom_line(aes(y = Act_Stool_Lands, color = "Stool Lands Revenue"), size = 1) +
geom_point(aes(y = Act_Stool_Lands, color = "Stool Lands Revenue")) +
geom_line(aes(y = Act_Licenses, color = "Licenses"), size = 1) +
geom_point(aes(y = Act_Licenses, color = "Licenses")) +
geom_line(aes(y = Act_Fees, color = "Act Fees"), size = 1) +
geom_point(aes(y = Act_Fees, color = "Act Fees")) +
labs(
title = "Land-Based Revenue Trends Over Years",
x = "Year",
y = "Revenue (Ghana Cedis)",
color = "Revenue Type"
) +
scale_y_continuous(labels = comma) +
scale_color_brewer(palette = "Set1")+
theme(
legend.position = "right",
legend.title = element_text(face = "bold"),
plot.title = element_text(hjust = 0.5, face = "bold")
)
# IGF and Land-Based Revenue Trends
ggplot(Cleaned_TMA_Data, aes(x = Year)) +
geom_line(aes(y = IGF, color = "IGF"), size = 1) +
geom_point(aes(y = IGF, color = "IGF")) +
geom_point(aes(y = Act_Permit, color = "Permit Fees")) +
geom_line(aes(y = Act_Property_Rates, color = "Property Rates"), size = 1) +
geom_point(aes(y = Act_Property_Rates, color = "Property Rates")) +
geom_line(aes(y = Act_Stool_Lands, color = "Stool Lands Revenue"), size = 1) +
geom_point(aes(y = Act_Stool_Lands, color = "Stool Lands Revenue")) +
geom_line(aes(y = Act_Licenses, color = "Licenses"), size = 1) +
geom_point(aes(y = Act_Licenses, color = "Licenses")) +
geom_line(aes(y = Act_Fees, color = "Act Fees"), size = 1) +
geom_point(aes(y = Act_Fees, color = "Act Fees")) +
labs(
title = "IGF vs. Land-Based Revenue Trends Over Years",
x = "Year",
y = "Revenue (Ghana Cedis)",
color = "Revenue Type"
) +
scale_y_continuous(labels = comma) +
scale_color_brewer(palette = "Set1")+
theme(
legend.position = "right",
legend.title = element_text(face = "bold"),
plot.title = element_text(hjust = 0.5, face = "bold")
)
The above shows the trends relationships.
The Act_Permit fees is all NA
# IGF vs Land-Based Revenues
model_igf_land <- lm(IGF ~ Act_Property_Rates + Act_Stool_Lands + Act_Licenses + Act_Fees, data = Cleaned_TMA_Data)
summary(model_igf_land)
##
## Call:
## lm(formula = IGF ~ Act_Property_Rates + Act_Stool_Lands + Act_Licenses +
## Act_Fees, data = Cleaned_TMA_Data)
##
## Residuals:
## 1 2 3 4 5 6 7 8
## 69875 1048965 668294 1110026 -1171010 -1080301 -257408 35609
## 9
## -424050
## attr(,"label")
## [1] "IGF"
## attr(,"format.spss")
## [1] "F8.0"
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3566249.3535 2891105.8491 1.234 0.28491
## Act_Property_Rates 0.8592 0.1566 5.486 0.00538 **
## Act_Stool_Lands 1.3844 0.7165 1.932 0.12551
## Act_Licenses 1.0979 0.5786 1.898 0.13060
## Act_Fees 0.2930 0.2727 1.075 0.34305
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1180000 on 4 degrees of freedom
## Multiple R-squared: 0.9636, Adjusted R-squared: 0.9271
## F-statistic: 26.45 on 4 and 4 DF, p-value: 0.003884
cor_matrix_land_igf <- cor(Cleaned_TMA_Data[, c("IGF", "Act_Property_Rates", "Act_Stool_Lands", "Act_Licenses", "Act_Fees")], use = "complete.obs")
print(cor_matrix_land_igf)
## IGF Act_Property_Rates Act_Stool_Lands Act_Licenses
## IGF 1.0000000 0.9275746 0.6081435 0.7217327
## Act_Property_Rates 0.9275746 1.0000000 0.3979615 0.5782723
## Act_Stool_Lands 0.6081435 0.3979615 1.0000000 0.4505038
## Act_Licenses 0.7217327 0.5782723 0.4505038 1.0000000
## Act_Fees 0.2410827 0.1896934 0.1317399 -0.1065127
## Act_Fees
## IGF 0.2410827
## Act_Property_Rates 0.1896934
## Act_Stool_Lands 0.1317399
## Act_Licenses -0.1065127
## Act_Fees 1.0000000
corrplot(cor_matrix_land_igf)
From the multiple regression results of all the land-based revenues ( property rates, rents, stool lands revenue, Act fees, licenses) and revenue (IGF) the overall model(p-value: 0.003884) is statistically significant with a high R-squared of 0.9636, means 96.36% of the variation in the IGF is explained by the land-based revenues ( property rates, rents, stool lands revenue, fees, licenses). However the individual term in the model that is significant are property rates
The correlation matrix shows that IGF is strongly correlated with Act property Rates and licenses.
# Simple linear Regression Analysis
model_property <- lm(IGF ~ Act_Property_Rates, data = Cleaned_TMA_Data)
model_stool <- lm(IGF ~ Act_Stool_Lands, data = Cleaned_TMA_Data)
model_license <- lm(IGF ~ Act_Licenses, data = Cleaned_TMA_Data)
model_acts <- lm(IGF ~ Act_Fees, data = Cleaned_TMA_Data)
# Visualizations
ggplot(Cleaned_TMA_Data, aes(x = Act_Property_Rates, y = IGF)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "IGF vs Property Rates", x = "Property Rates", y = "IGF") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_property)
##
## Call:
## lm(formula = IGF ~ Act_Property_Rates, data = Cleaned_TMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3578979 -328454 85912 844776 2344601
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10888314.3088 1783859.3630 6.104 0.000489 ***
## Act_Property_Rates 1.1700 0.1781 6.568 0.000313 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1746000 on 7 degrees of freedom
## Multiple R-squared: 0.8604, Adjusted R-squared: 0.8405
## F-statistic: 43.14 on 1 and 7 DF, p-value: 0.0003135
ggplot(Cleaned_TMA_Data, aes(x = Act_Stool_Lands, y = IGF)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "IGF vs Stool Lands Revenue", x = "Stool Lands Revenue", y = "IGF") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_stool)
##
## Call:
## lm(formula = IGF ~ Act_Stool_Lands, data = Cleaned_TMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5192465 -2634614 447489 2817828 4650878
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13794299.239 4215999.208 3.272 0.0136 *
## Act_Stool_Lands 3.956 1.952 2.027 0.0823 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3710000 on 7 degrees of freedom
## Multiple R-squared: 0.3698, Adjusted R-squared: 0.2798
## F-statistic: 4.108 on 1 and 7 DF, p-value: 0.08229
ggplot(Cleaned_TMA_Data, aes(x = Act_Licenses, y = IGF)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "IGF vs Licenses", x = "Licenses", y = "IGF") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_license)
##
## Call:
## lm(formula = IGF ~ Act_Licenses, data = Cleaned_TMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3079279 -2184341 -551448 751237 6858673
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3405903.671 6812792.297 0.500 0.6325
## Act_Licenses 3.252 1.179 2.759 0.0281 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3235000 on 7 degrees of freedom
## Multiple R-squared: 0.5209, Adjusted R-squared: 0.4525
## F-statistic: 7.611 on 1 and 7 DF, p-value: 0.02815
ggplot(Cleaned_TMA_Data, aes(x = Act_Fees, y = IGF)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "IGF vs Act Fees", x = "Act Fees", y = "IGF") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_acts)
##
## Call:
## lm(formula = IGF ~ Act_Fees, data = Cleaned_TMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8870400 -2004978 220070 2140484 4981247
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 19459720.5661 4098988.2482 4.747 0.00209 **
## Act_Fees 0.6435 0.9791 0.657 0.53204
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4536000 on 7 degrees of freedom
## Multiple R-squared: 0.05812, Adjusted R-squared: -0.07643
## F-statistic: 0.432 on 1 and 7 DF, p-value: 0.532
The simple linear regression analysis of the land-based revenues found licenses and property rate simple models to be statistically significant and have a strong positive linear relationship with IGF but the other two (stool lands and Act fees ) do not.
# DACF Trend
ggplot(Cleaned_TMA_Data, aes(x = Year, y = DACF)) +
geom_line(color = "blue", size = 1) +
geom_point(size = 2.5) +
labs(
title = "DACF Trend Over Years",
x = "Year",
y = "DACF (Ghana Cedis)"
) +
scale_y_continuous(labels = comma) +
theme(
plot.title = element_text(hjust = 0.5, face = "bold"),
axis.title = element_text(face = "bold")
)
# Land-Based Revenue Trends
ggplot(Cleaned_TMA_Data, aes(x = Year)) +
geom_line(aes(y = Act_Permit, color = "Permit Fees"), size = 1) +
geom_line(aes(y = Act_Property_Rates, color = "Property Rates"), size = 1) +
geom_line(aes(y = Act_Stool_Lands, color = "Stool Lands Revenue"), size = 1) +
geom_line(aes(y = Act_Licenses, color = "Licenses"), size = 1) +
geom_line(aes(y = Act_Fees, color = "Act_Fees"), size = 1) +
labs(
title = "Land-Based Revenue Trends Over Years",
x = "Year",
y = "Revenue (Ghana Cedis)",
color = "Revenue Type"
) +
scale_y_continuous(labels = comma) +
theme(
legend.position = "right",
legend.title = element_text(face = "bold"),
plot.title = element_text(hjust = 0.5, face = "bold")
)
#DACF and Land-Based Revenue Trends
ggplot(Cleaned_TMA_Data, aes(x = Year)) +
geom_line(aes(y = Act_Property_Rates, color = "Property Rates"), size = 1) +
geom_line(aes(y = Act_Stool_Lands, color = "Stool Lands Revenue"), size = 1) +
geom_line(aes(y = Act_Licenses, color = "Licenses"), size = 1) +
geom_line(aes(y = Act_Fees, color = "Act_Fees"), size = 1) +
geom_line(aes(y = DACF, color = "DACF"), size = 1) +
labs(
title = "DACF vs.Land-Based Revenue Trends Over Years",
x = "Year",
y = "Revenue (Ghana Cedis)",
color = "Revenue Type"
) +
scale_y_continuous(labels = comma) +
theme(
legend.position = "right",
legend.title = element_text(face = "bold"),
plot.title = element_text(hjust = 0.5, face = "bold")
)
The above shows the trends relationships.
# DACF vs Land-Based Revenues
model_DACF_land <- lm(DACF ~ Act_Property_Rates + Act_Stool_Lands + Act_Licenses + Act_Fees, data = Cleaned_TMA_Data)
summary(model_DACF_land)
##
## Call:
## lm(formula = DACF ~ Act_Property_Rates + Act_Stool_Lands + Act_Licenses +
## Act_Fees, data = Cleaned_TMA_Data)
##
## Residuals:
## 1 2 3 4 5 6 7 8 9
## -142157 43785 441966 1366 -33924 -86455 192994 -533614 116039
## attr(,"label")
## [1] "DACF"
## attr(,"format.spss")
## [1] "F8.0"
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2586632.97674 917997.91864 2.818 0.0479 *
## Act_Property_Rates 0.12122 0.04973 2.437 0.0714 .
## Act_Stool_Lands -0.31185 0.22752 -1.371 0.2424
## Act_Licenses -0.09577 0.18372 -0.521 0.6297
## Act_Fees -0.13195 0.08659 -1.524 0.2022
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 374700 on 4 degrees of freedom
## Multiple R-squared: 0.6831, Adjusted R-squared: 0.3662
## F-statistic: 2.155 on 4 and 4 DF, p-value: 0.2376
cor_matrix_land_DACF <- cor(Cleaned_TMA_Data[, c("DACF", "Act_Property_Rates", "Act_Stool_Lands", "Act_Licenses", "Act_Fees")], use = "complete.obs")
print(cor_matrix_land_DACF)
## DACF Act_Property_Rates Act_Stool_Lands Act_Licenses
## DACF 1.0000000 0.5141565 -0.2395623 0.1670543
## Act_Property_Rates 0.5141565 1.0000000 0.3979615 0.5782723
## Act_Stool_Lands -0.2395623 0.3979615 1.0000000 0.4505038
## Act_Licenses 0.1670543 0.5782723 0.4505038 1.0000000
## Act_Fees -0.3275189 0.1896934 0.1317399 -0.1065127
## Act_Fees
## DACF -0.3275189
## Act_Property_Rates 0.1896934
## Act_Stool_Lands 0.1317399
## Act_Licenses -0.1065127
## Act_Fees 1.0000000
corrplot(cor_matrix_land_DACF)
The multiple regression results of all the land-based revenues (permit fees, property rates, rents, stool lands revenue, licenses) and revenue (DACF) is not statistically significant ( p-value: 0.2376) with a R-squared of 0.6831 and Adjusted R-squared of 0.3662 means a poor model and does fit. In terms of individual terms none is significant as well.
The correlation matrix shows that DACF is weakly correlated with all the land-based revenues.
# Simple linear Regression Analysis
model_property <- lm(DACF ~ Act_Property_Rates, data = Cleaned_TMA_Data)
model_stool <- lm(DACF ~ Act_Stool_Lands, data = Cleaned_TMA_Data)
model_license <- lm(DACF ~ Act_Licenses, data = Cleaned_TMA_Data)
model_acts <- lm(DACF ~ Act_Fees, data = Cleaned_TMA_Data)
ggplot(Cleaned_TMA_Data, aes(x = Act_Property_Rates, y = DACF)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "DACF vs Property Rates", x = "Property Rates", y = "DACF") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_property)
##
## Call:
## lm(formula = DACF ~ Act_Property_Rates, data = Cleaned_TMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -509469 -275061 -132053 237203 751242
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1369197.32559 440816.24637 3.106 0.0172 *
## Act_Property_Rates 0.06982 0.04402 1.586 0.1568
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 431500 on 7 degrees of freedom
## Multiple R-squared: 0.2644, Adjusted R-squared: 0.1593
## F-statistic: 2.515 on 1 and 7 DF, p-value: 0.1568
ggplot(Cleaned_TMA_Data, aes(x = Act_Stool_Lands, y = DACF)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "DACF vs Stool Lands Revenue", x = "Stool Lands Revenue", y = "DACF") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_stool)
##
## Call:
## lm(formula = DACF ~ Act_Stool_Lands, data = Cleaned_TMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -832174 -242817 38979 216434 650871
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2376508.3152 555079.4308 4.281 0.00365 **
## Act_Stool_Lands -0.1677 0.2569 -0.653 0.53471
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 488500 on 7 degrees of freedom
## Multiple R-squared: 0.05739, Adjusted R-squared: -0.07727
## F-statistic: 0.4262 on 1 and 7 DF, p-value: 0.5347
ggplot(Cleaned_TMA_Data, aes(x = Act_Licenses, y = DACF)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "DACF vs Licenses", x = "Licenses", y = "DACF") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_license)
##
## Call:
## lm(formula = DACF ~ Act_Licenses, data = Cleaned_TMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -576007 -262991 -41789 213396 934033
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1567669.80567 1044671.27737 1.501 0.177
## Act_Licenses 0.08104 0.18077 0.448 0.667
##
## Residual standard error: 496100 on 7 degrees of freedom
## Multiple R-squared: 0.02791, Adjusted R-squared: -0.111
## F-statistic: 0.201 on 1 and 7 DF, p-value: 0.6675
ggplot(Cleaned_TMA_Data, aes(x = Act_Fees, y = DACF)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "DACF vs Act Fees", x = "Act Fees", y = "DACF") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_acts)
##
## Call:
## lm(formula = DACF ~ Act_Fees, data = Cleaned_TMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -608193 -287851 -128807 190656 806153
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2396278.53933 429589.88352 5.578 0.000835 ***
## Act_Fees -0.09411 0.10262 -0.917 0.389585
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 475400 on 7 degrees of freedom
## Multiple R-squared: 0.1073, Adjusted R-squared: -0.02026
## F-statistic: 0.8411 on 1 and 7 DF, p-value: 0.3896
The simple linear regression analysis of the land-based revenues found all of them models to be statistically non-significant.
# Capital_Expenditure Trend
ggplot(Cleaned_TMA_Data, aes(x = Year, y = Capital_Expenditure)) +
geom_line(color = "blue", size = 1) +
geom_point(size = 2.5) +
labs(
title = "Capital Expenditure Trend Over Years",
x = "Year",
y = "Capital_Expenditure (Ghana Cedis)"
) +
scale_y_continuous(labels = comma) +
theme(
plot.title = element_text(hjust = 0.5, face = "bold"),
axis.title = element_text(face = "bold")
)
# Land-Based Revenue Trends
ggplot(Cleaned_TMA_Data, aes(x = Year)) +
geom_line(aes(y = Act_Property_Rates, color = "Property Rates"), size = 1) +
geom_line(aes(y = Act_Stool_Lands, color = "Stool Lands Revenue"), size = 1) +
geom_line(aes(y = Act_Licenses, color = "Licenses"), size = 1) +
geom_line(aes(y = Act_Fees, color = "Act_Fees"), size = 1) +
labs(
title = "Land-Based Revenue Trends Over Years",
x = "Year",
y = "Revenue (Ghana Cedis)",
color = "Revenue Type"
) +
scale_y_continuous(labels = comma) +
theme(
legend.position = "right",
legend.title = element_text(face = "bold"),
plot.title = element_text(hjust = 0.5, face = "bold")
)
#Capital_Expenditure and Land-Based Revenue Trends
ggplot(Cleaned_TMA_Data, aes(x = Year)) +
geom_line(aes(y = Act_Property_Rates, color = "Property Rates"), size = 1) +
geom_line(aes(y = Act_Stool_Lands, color = "Stool Lands Revenue"), size = 1) +
geom_line(aes(y = Act_Licenses, color = "Licenses"), size = 1) +
geom_line(aes(y = Act_Fees, color = "Act_Fees"), size = 1) +
geom_line(aes(y = Capital_Expenditure, color = "Capital_Expenditure"), size = 1) +
labs(
title = "Capital Exp. vs.Land-Based Revenue Trends Over Years",
x = "Year",
y = "Revenue (Ghana Cedis)",
color = "Revenue Type"
) +
scale_y_continuous(labels = comma) +
theme(
legend.position = "right",
legend.title = element_text(face = "bold"),
plot.title = element_text(hjust = 0.5, face = "bold")
)
The above shows the trends relationships.
# Capital_Expenditure vs Land-Based Revenues
model_Capital_Expenditure_land <- lm(Capital_Expenditure ~ Act_Property_Rates + Act_Stool_Lands + Act_Licenses + Act_Fees, data = Cleaned_TMA_Data)
summary(model_Capital_Expenditure_land)
##
## Call:
## lm(formula = Capital_Expenditure ~ Act_Property_Rates + Act_Stool_Lands +
## Act_Licenses + Act_Fees, data = Cleaned_TMA_Data)
##
## Residuals:
## 1 2 3 4 5 6 7 8
## -3756399 455561 7006915 -1704838 -594474 532783 3638735 -2967150
## 9
## -2611133
## attr(,"label")
## [1] "Capital Expenditure"
## attr(,"format.spss")
## [1] "F8.0"
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6530990.7975 11991700.2650 0.545 0.615
## Act_Property_Rates -0.1568 0.6496 -0.241 0.821
## Act_Stool_Lands 1.5988 2.9720 0.538 0.619
## Act_Licenses -0.1836 2.3999 -0.077 0.943
## Act_Fees 0.2903 1.1311 0.257 0.810
##
## Residual standard error: 4894000 on 4 degrees of freedom
## Multiple R-squared: 0.09363, Adjusted R-squared: -0.8127
## F-statistic: 0.1033 on 4 and 4 DF, p-value: 0.9753
cor_matrix_land_Capital_Expenditure <- cor(Cleaned_TMA_Data[, c("Capital_Expenditure", "Act_Property_Rates", "Act_Stool_Lands", "Act_Licenses", "Act_Fees")], use = "complete.obs")
print(cor_matrix_land_Capital_Expenditure)
## Capital_Expenditure Act_Property_Rates Act_Stool_Lands
## Capital_Expenditure 1.00000000 -0.0353889 0.2312611
## Act_Property_Rates -0.03538890 1.0000000 0.3979615
## Act_Stool_Lands 0.23126109 0.3979615 1.0000000
## Act_Licenses -0.01622685 0.5782723 0.4505038
## Act_Fees 0.14661796 0.1896934 0.1317399
## Act_Licenses Act_Fees
## Capital_Expenditure -0.01622685 0.1466180
## Act_Property_Rates 0.57827232 0.1896934
## Act_Stool_Lands 0.45050383 0.1317399
## Act_Licenses 1.00000000 -0.1065127
## Act_Fees -0.10651266 1.0000000
corrplot(cor_matrix_land_Capital_Expenditure)
The multiple regression results of all the land-based revenues ( property rates, rents, stool lands revenue, licenses) and revenue (Capital_Expenditure) is not statistically significant with p-value (0.2376), R-squared of 0.6831 and Adjusted R-squared of 0.3662 . The individual terms too not significant
The correlation matrix shows that Capital_Expenditure shows poorly correlated with all the land-based revenues.
# Simple linear Regression Analysis
model_property <- lm(Capital_Expenditure ~ Act_Property_Rates, data = Cleaned_TMA_Data)
model_stool <- lm(Capital_Expenditure ~ Act_Stool_Lands, data = Cleaned_TMA_Data)
model_license <- lm(Capital_Expenditure ~ Act_Licenses, data = Cleaned_TMA_Data)
model_acts <- lm(Capital_Expenditure ~ Act_Fees, data = Cleaned_TMA_Data)
ggplot(Cleaned_TMA_Data, aes(x = Act_Property_Rates, y = Capital_Expenditure)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Capital_Expenditure vs Property Rates", x = "Property Rates", y = "Capital_Expenditure") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_property)
##
## Call:
## lm(formula = Capital_Expenditure ~ Act_Property_Rates, data = Cleaned_TMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3585869 -3557900 -266236 1890056 7743316
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8781827.97059 3967418.71544 2.213 0.0625 .
## Act_Property_Rates -0.03712 0.39619 -0.094 0.9280
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3884000 on 7 degrees of freedom
## Multiple R-squared: 0.001252, Adjusted R-squared: -0.1414
## F-statistic: 0.008778 on 1 and 7 DF, p-value: 0.928
ggplot(Cleaned_TMA_Data, aes(x = Act_Stool_Lands, y = Capital_Expenditure)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Capital_Expenditure vs Stool Lands Revenue", x = "Stool Lands Revenue", y = "Capital_Expenditure") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_stool)
##
## Call:
## lm(formula = Capital_Expenditure ~ Act_Stool_Lands, data = Cleaned_TMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3628577 -2379799 -267361 597454 7548337
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5847262.242 4296450.429 1.361 0.216
## Act_Stool_Lands 1.251 1.989 0.629 0.549
##
## Residual standard error: 3781000 on 7 degrees of freedom
## Multiple R-squared: 0.05348, Adjusted R-squared: -0.08174
## F-statistic: 0.3955 on 1 and 7 DF, p-value: 0.5494
ggplot(Cleaned_TMA_Data, aes(x = Act_Licenses, y = Capital_Expenditure)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Capital_Expenditure vs Licenses", x = "Licenses", y = "Capital_Expenditure") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_license)
##
## Call:
## lm(formula = Capital_Expenditure ~ Act_Licenses, data = Cleaned_TMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3612873 -3431683 -319712 1775473 7817501
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8777411.6076 8183229.9087 1.073 0.319
## Act_Licenses -0.0608 1.4160 -0.043 0.967
##
## Residual standard error: 3886000 on 7 degrees of freedom
## Multiple R-squared: 0.0002633, Adjusted R-squared: -0.1426
## F-statistic: 0.001844 on 1 and 7 DF, p-value: 0.967
ggplot(Cleaned_TMA_Data, aes(x = Act_Fees, y = Capital_Expenditure)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Capital_Expenditure vs Act Fees", x = "Act Fees", y = "Capital_Expenditure") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_acts)
##
## Call:
## lm(formula = Capital_Expenditure ~ Act_Fees, data = Cleaned_TMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3806903 -3588002 7569 1945734 7294359
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7164192.4106 3474008.8638 2.062 0.0781 .
## Act_Fees 0.3254 0.8298 0.392 0.7066
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3844000 on 7 degrees of freedom
## Multiple R-squared: 0.0215, Adjusted R-squared: -0.1183
## F-statistic: 0.1538 on 1 and 7 DF, p-value: 0.7066
The simple linear regression analysis of the land-based revenues found none to be significant.
# Capital_Expenditure Trend
ggplot(Cleaned_TMA_Data, aes(x = Year, y = Recrrent_Expenditure)) +
geom_line(color = "blue", size = 1) +
geom_point(size = 2.5) +
labs(
title = "Recurrent Expenditure Trend ",
x = "Year",
y = "Recurrent Expenditure (Ghana Cedis)"
) +
scale_y_continuous(labels = comma) +
theme(
plot.title = element_text(hjust = 0.5, face = "bold"),
axis.title = element_text(face = "bold")
)
# Land-Based Revenue Trends
ggplot(Cleaned_TMA_Data, aes(x = Year)) +
geom_line(aes(y = Act_Property_Rates, color = "Property Rates"), size = 1) +
geom_line(aes(y = Act_Stool_Lands, color = "Stool Lands Revenue"), size = 1) +
geom_line(aes(y = Act_Licenses, color = "Licenses"), size = 1) +
geom_line(aes(y = Act_Fees, color = "Act_Fees"), size = 1) +
labs(
title = "Land-Based Revenue Trend",
x = "Year",
y = "Revenue (Ghana Cedis)",
color = "Revenue Type"
) +
scale_y_continuous(labels = comma) +
theme(
legend.position = "right",
legend.title = element_text(face = "bold"),
plot.title = element_text(hjust = 0.5, face = "bold")
)
#Capital_Expenditure and Land-Based Revenue Trends
ggplot(Cleaned_TMA_Data, aes(x = Year)) +
geom_line(aes(y = Act_Property_Rates, color = "Property Rates"), size = 1) +
geom_line(aes(y = Act_Stool_Lands, color = "Stool Lands Revenue"), size = 1) +
geom_line(aes(y = Act_Licenses, color = "Licenses"), size = 1) +
geom_line(aes(y = Act_Fees, color = "Act_Fees"), size = 1) +
geom_line(aes(y = Recrrent_Expenditure, color = "Recurrent_Expenditure"), size = 1) +
labs(
title = "Recurrent Exp. vs.Land-Based Revenue Trend",
x = "Year",
y = "Revenue (Ghana Cedis)",
color = "Revenue Type"
) +
scale_y_continuous(labels = comma) +
theme(
legend.position = "right",
legend.title = element_text(face = "bold"),
plot.title = element_text(hjust = 0.5, face = "bold")
)
The above shows the trends relationships.
# Capital_Expenditure vs Land-Based Revenues
model_recurrent_Expenditure_land <- lm(Recrrent_Expenditure ~ Act_Property_Rates + Act_Stool_Lands + Act_Licenses + Act_Fees, data = Cleaned_TMA_Data)
summary(model_recurrent_Expenditure_land)
##
## Call:
## lm(formula = Recrrent_Expenditure ~ Act_Property_Rates + Act_Stool_Lands +
## Act_Licenses + Act_Fees, data = Cleaned_TMA_Data)
##
## Residuals:
## 1 2 3 4 5 6 7 8
## -1181523 2353021 2750286 3457167 1140170 -5341007 -202344 -3293918
## 9
## 318146
## attr(,"label")
## [1] "Recrrent Expenditure"
## attr(,"format.spss")
## [1] "F8.0"
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3083606.0937 10047358.8350 -0.307 0.774
## Act_Property_Rates 0.9350 0.5443 1.718 0.161
## Act_Stool_Lands 1.8949 2.4902 0.761 0.489
## Act_Licenses 0.9097 2.0108 0.452 0.674
## Act_Fees 0.1300 0.9477 0.137 0.898
##
## Residual standard error: 4101000 on 4 degrees of freedom
## Multiple R-squared: 0.7126, Adjusted R-squared: 0.4252
## F-statistic: 2.479 on 4 and 4 DF, p-value: 0.2003
cor_matrix_land_recurrent_Expenditure <- cor(Cleaned_TMA_Data[, c("Recrrent_Expenditure", "Act_Property_Rates", "Act_Stool_Lands", "Act_Licenses", "Act_Fees")], use = "complete.obs")
print(cor_matrix_land_recurrent_Expenditure)
## Recrrent_Expenditure Act_Property_Rates Act_Stool_Lands
## Recrrent_Expenditure 1.0000000 0.7946753 0.5526050
## Act_Property_Rates 0.7946753 1.0000000 0.3979615
## Act_Stool_Lands 0.5526050 0.3979615 1.0000000
## Act_Licenses 0.6115239 0.5782723 0.4505038
## Act_Fees 0.1666612 0.1896934 0.1317399
## Act_Licenses Act_Fees
## Recrrent_Expenditure 0.6115239 0.1666612
## Act_Property_Rates 0.5782723 0.1896934
## Act_Stool_Lands 0.4505038 0.1317399
## Act_Licenses 1.0000000 -0.1065127
## Act_Fees -0.1065127 1.0000000
corrplot(cor_matrix_land_recurrent_Expenditure)
The multiple regression results of all the land-based revenues property rates, rents, stool lands revenue, fees, licenses) and revenue Recurrent Expenditure has an overall statistically non-significant with p-value (0.2003), R-squared of 0.7126 and Adjusted R-squared of 0.4252. All the individual terms are also not statistically significant.
# Simple linear Regression Analysis
model_property <- lm(Recrrent_Expenditure ~ Act_Property_Rates, data = Cleaned_TMA_Data)
model_stool <- lm(Recrrent_Expenditure ~ Act_Stool_Lands, data = Cleaned_TMA_Data)
model_license <- lm(Recrrent_Expenditure ~ Act_Licenses, data = Cleaned_TMA_Data)
model_acts <- lm(Recrrent_Expenditure ~ Act_Fees, data = Cleaned_TMA_Data)
ggplot(Cleaned_TMA_Data, aes(x = Act_Property_Rates, y = Recrrent_Expenditure))+
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Recurrent_Expenditure vs Property Rates", x = "Property Rates", y = "Recurrent_Expenditure") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_property)
##
## Call:
## lm(formula = Recrrent_Expenditure ~ Act_Property_Rates, data = Cleaned_TMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3793400 -3521930 352276 2695420 4156772
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3638040.6124 3585646.5062 1.015 0.3441
## Act_Property_Rates 1.2402 0.3581 3.464 0.0105 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3510000 on 7 degrees of freedom
## Multiple R-squared: 0.6315, Adjusted R-squared: 0.5789
## F-statistic: 12 on 1 and 7 DF, p-value: 0.0105
ggplot(Cleaned_TMA_Data, aes(x = Act_Stool_Lands, y = Recrrent_Expenditure)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Recurrent_Expenditure vs Stool Lands Revenue", x = "Stool Lands Revenue", y = "Recurrent_Expenditure") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_stool)
##
## Call:
## lm(formula = Recrrent_Expenditure ~ Act_Stool_Lands, data = Cleaned_TMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6838307 -1321199 -47605 2525343 5459296
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6193148.822 5476399.733 1.131 0.295
## Act_Stool_Lands 4.447 2.535 1.754 0.123
##
## Residual standard error: 4819000 on 7 degrees of freedom
## Multiple R-squared: 0.3054, Adjusted R-squared: 0.2061
## F-statistic: 3.077 on 1 and 7 DF, p-value: 0.1228
ggplot(Cleaned_TMA_Data, aes(x = Act_Licenses, y = Recrrent_Expenditure)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Recurrent_Expenditure vs Licenses", x = "Licenses", y = "Recurrent_Expenditure") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_license)
##
## Call:
## lm(formula = Recrrent_Expenditure ~ Act_Licenses, data = Cleaned_TMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4852621 -2931073 -12744 833653 9724135
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4076559.720 9635100.583 -0.423 0.6849
## Act_Licenses 3.409 1.667 2.045 0.0801 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4575000 on 7 degrees of freedom
## Multiple R-squared: 0.374, Adjusted R-squared: 0.2845
## F-statistic: 4.181 on 1 and 7 DF, p-value: 0.08014
ggplot(Cleaned_TMA_Data, aes(x = Act_Fees, y = Recrrent_Expenditure)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Recurrent_Expenditure vs Act Fees", x = "Act Fees", y = "Recurrent_Expenditure") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_acts)
##
## Call:
## lm(formula = Recrrent_Expenditure ~ Act_Fees, data = Cleaned_TMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10796981 -2163810 -527576 2101513 7987102
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13235777.1040 5152366.7063 2.569 0.0371 *
## Act_Fees 0.5504 1.2308 0.447 0.6682
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5701000 on 7 degrees of freedom
## Multiple R-squared: 0.02778, Adjusted R-squared: -0.1111
## F-statistic: 0.2 on 1 and 7 DF, p-value: 0.6682
The simple linear regression analysis of the land-based revenues found only property rate to be significant the rest are not.
# Population Trend
ggplot(Cleaned_TMA_Data, aes(x = Year, y = Population)) +
geom_line(color = "blue", size = 1) +
geom_point(size = 2.5) +
labs(
title = "Population Trend Over Years",
x = "Year",
y = "Population "
) +
scale_y_continuous(labels = comma) +
theme(
plot.title = element_text(hjust = 0.5, face = "bold"),
axis.title = element_text(face = "bold")
)
# Land-Based Revenue Trends
ggplot(Cleaned_TMA_Data, aes(x = Year)) +
geom_line(aes(y = Act_Property_Rates, color = "Property Rates"), size = 1) +
geom_line(aes(y = Act_Stool_Lands, color = "Stool Lands Revenue"), size = 1) +
geom_line(aes(y = Act_Licenses, color = "Licenses"), size = 1) +
geom_line(aes(y = Act_Fees, color = "Act_Fees"), size = 1) +
labs(
title = "Land-Based Revenue Trends Over Years",
x = "Year",
y = "Revenue (Ghana Cedis)",
color = "Revenue Type"
) +
scale_y_continuous(labels = comma) +
theme(
legend.position = "right",
legend.title = element_text(face = "bold"),
plot.title = element_text(hjust = 0.5, face = "bold")
)
#Population and Land-Based Revenue Trends
ggplot(Cleaned_TMA_Data, aes(x = Year)) +
geom_line(aes(y = Act_Property_Rates, color = "Property Rates"), size = 1) +
geom_line(aes(y = Act_Stool_Lands, color = "Stool Lands Revenue"), size = 1) +
geom_line(aes(y = Act_Licenses, color = "Licenses"), size = 1) +
geom_line(aes(y = Act_Fees, color = "Act_Fees"), size = 1) +
geom_line(aes(y = Population, color = "Population"), size = 1) +
labs(
title = "Population vs.Land-Based Revenue Trends Over Years",
x = "Year",
y = "Population . Revenue (Ghana Cedis)",
color = "Revenue Type"
) +
scale_y_continuous(labels = comma) +
theme(
legend.position = "right",
legend.title = element_text(face = "bold"),
plot.title = element_text(hjust = 0.5, face = "bold")
)
The above shows the trends relationships.
# Population vs Land-Based Revenues
model_Population_land <- lm(Population ~ Act_Property_Rates + Act_Stool_Lands + Act_Licenses + Act_Fees, data = Cleaned_TMA_Data)
summary(model_Population_land)
##
## Call:
## lm(formula = Population ~ Act_Property_Rates + Act_Stool_Lands +
## Act_Licenses + Act_Fees, data = Cleaned_TMA_Data)
##
## Residuals:
## 1 2 3 4 5 6 7 8 9
## -27746 48612 36301 78482 50634 -125587 -15009 -40751 -4936
## attr(,"label")
## [1] "Population"
## attr(,"format.spss")
## [1] "F8.0"
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 300617.892441 215189.733599 1.397 0.235
## Act_Property_Rates -0.014990 0.011657 -1.286 0.268
## Act_Stool_Lands 0.056073 0.053333 1.051 0.352
## Act_Licenses -0.006788 0.043065 -0.158 0.882
## Act_Fees 0.006975 0.020297 0.344 0.748
##
## Residual standard error: 87830 on 4 degrees of freedom
## Multiple R-squared: 0.4216, Adjusted R-squared: -0.1568
## F-statistic: 0.7289 on 4 and 4 DF, p-value: 0.6166
cor_matrix_land_Population <- cor(Cleaned_TMA_Data[, c("Population", "Act_Property_Rates", "Act_Stool_Lands", "Act_Licenses", "Act_Fees")], use = "complete.obs")
print(cor_matrix_land_Population)
## Population Act_Property_Rates Act_Stool_Lands Act_Licenses
## Population 1.00000000 -0.4726415 0.1904391 -0.2555305
## Act_Property_Rates -0.47264154 1.0000000 0.3979615 0.5782723
## Act_Stool_Lands 0.19043910 0.3979615 1.0000000 0.4505038
## Act_Licenses -0.25553053 0.5782723 0.4505038 1.0000000
## Act_Fees 0.08860518 0.1896934 0.1317399 -0.1065127
## Act_Fees
## Population 0.08860518
## Act_Property_Rates 0.18969337
## Act_Stool_Lands 0.13173987
## Act_Licenses -0.10651266
## Act_Fees 1.00000000
corrplot(cor_matrix_land_Population)
The multiple regression results of all the land-based revenues ( property rates, rents, stool lands revenue, act fees, licenses) and Population overall F-statistic: 0.7289 and p-value: 0.6166 is not statistically significant with R-squared of 0.4216 ,, and Adjusted R-squared of -0.1568 means poor model fit. The individual terms too are not significant.
The correlation matrix shows that Population is very weakly correlated with all the land-based revenues.
# Simple linear Regression Analysis
model_property <- lm(Population ~ Act_Property_Rates, data = Cleaned_TMA_Data)
model_stool <- lm(Population ~ Act_Stool_Lands, data = Cleaned_TMA_Data)
model_license <- lm(Population ~ Act_Licenses, data = Cleaned_TMA_Data)
model_acts <- lm(Population ~ Act_Fees, data = Cleaned_TMA_Data)
ggplot(Cleaned_TMA_Data, aes(x = Act_Property_Rates, y = Population)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Population vs Property Rates", x = "Property Rates", y = "Population") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_property)
##
## Call:
## lm(formula = Population ~ Act_Property_Rates, data = Cleaned_TMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -76742 -62789 -26800 56784 124473
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 368352.718372 78588.864134 4.687 0.00224 **
## Act_Property_Rates -0.011136 0.007848 -1.419 0.19886
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 76930 on 7 degrees of freedom
## Multiple R-squared: 0.2234, Adjusted R-squared: 0.1124
## F-statistic: 2.014 on 1 and 7 DF, p-value: 0.1989
ggplot(Cleaned_TMA_Data, aes(x = Act_Stool_Lands, y = Population)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Population vs Stool Lands Revenue", x = "Stool Lands Revenue", y = "Population") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_stool)
##
## Call:
## lm(formula = Population ~ Act_Stool_Lands, data = Cleaned_TMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -111752 -78253 58458 63570 85148
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 215154.80588 97387.45637 2.209 0.0629 .
## Act_Stool_Lands 0.02314 0.04508 0.513 0.6236
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 85700 on 7 degrees of freedom
## Multiple R-squared: 0.03627, Adjusted R-squared: -0.1014
## F-statistic: 0.2634 on 1 and 7 DF, p-value: 0.6236
ggplot(Cleaned_TMA_Data, aes(x = Act_Licenses, y = Population)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Population vs Licenses", x = "Licenses", y = "Population") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_license)
##
## Call:
## lm(formula = Population ~ Act_Licenses, data = Cleaned_TMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -114167 -67593 14300 73616 83555
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 385667.91755 177745.41062 2.170 0.0666 .
## Act_Licenses -0.02151 0.03076 -0.699 0.5069
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 84400 on 7 degrees of freedom
## Multiple R-squared: 0.0653, Adjusted R-squared: -0.06823
## F-statistic: 0.489 on 1 and 7 DF, p-value: 0.5069
ggplot(Cleaned_TMA_Data, aes(x = Act_Fees, y = Population)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Population vs Act Fees", x = "Act Fees", y = "Population") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
summary(model_acts)
##
## Call:
## lm(formula = Population ~ Act_Fees, data = Cleaned_TMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -87390 -86279 43770 69897 89597
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 245749.710113 78581.018290 3.127 0.0167 *
## Act_Fees 0.004418 0.018771 0.235 0.8207
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 86960 on 7 degrees of freedom
## Multiple R-squared: 0.007851, Adjusted R-squared: -0.1339
## F-statistic: 0.05539 on 1 and 7 DF, p-value: 0.8207
The simple linear regression analysis of the land-based revenues found all of them to not be statistically significant.
# no variables
# no variables